Stochastic Mechanics 
Random Media 
Signal Processing 
and Image Synthesis 
Mathematical Economics and Finance 
Stochastic Optimization 
Stochastic Control 
Stochastic Models in Life Sciences 



Applications of 
Mathematics 



Stochastic Modelling 
and Applied Probability 




Edited by B. Rozovskii 
M. Yor 



Advisory Board 



D. Dawson 

D. Geman 
G. Grimmett 
I. Karatzas 

R Kelly 
Y. Le Jan 

E. Pardoux 

G. Papanicolaou 



Springer- Verlag Berlin Heidelberg GmbH 



Applications of Mathematics 



1 Fleming/Rishel, Deterministic and Stochastic Optimal Control (1975) 

2 Marchuk, Methods of Numerical Mathematics, Second Edition (1982) 

3 Balakrishnan, Applied Functional Analysis, Second Edition (1981) 

4 Borovkov, Stochastic Processes in Queueing Theory (1976) 

5 Liptser/Shiryaev, Statistics of Random Processes I: General Theory (2000) first ed. 1977 

6 Liptser/Shiryaev, Statistics of Random Processes II: Applications (2000) first ed. 1978 

7 Vorob’ev, Game Theory: Lectures for Economists and Systems Scientists (1977) 

8 Shiryaev, Optimal Stopping Rules (1978) 

9 Ibragimov/Rozanov, Gaussian Random Processes (1978) 

10 Wonham, Linear Multivariable Control: A Geometric Approach, Third Edition (1985) 

11 Hida, Brownian Motion (1980) 

12 Hestenes, Conjugate Direction Methods in Optimization (1980) 

13 Kallianpur, Stochastic Filtering Theory (1980) 

14 Krylov, Controlled Diffusion Processes (1980) 

15 Prabhu, Stochastic Storage Processes: Queues, Insurance Risk, and Dams (1980) 

16 Ibragimov/Has’minskii, Statistical Estimation: Asymptotic Theory (1981) 

17 Cesari, Optimization: Theory and Applications (1982) 

18 Elliott, Stochastic Calculus and Applications (1982) 

19 Marchuk/Shaidourov, Difference Methods and Their Extrapolations (1983) 

20 Hijab, Stabilization of Control Systems (1986) 

21 Protter, Stochastic Integration and Differential Equations (1990) 

22 Benveniste/Metivier/Priouret, Adaptive Algorithms and Stochastic Approximations (1990) 

23 Kloeden/Platen, Numerical Solution of Stochastic Differential Equations (1992) corrected 
3rd printing 1999 

24 Kushner/Dupuis, Numerical Methods for Stochastic Control Problems in Continuous 
Time (1992) 

25 Fleming/Soner, Controlled Markov Processes and Viscosity Solutions (1993) 

26 Baccelli/Bremaud, Elements of Queueing Theory (1994) 

27 Winkler, Image Analysis, Random Fields and Dynamic Monte Carlo Methods (2003) 
first ed. 1995 

28 Kalpazidou, Cycle Representations of Markov Processes (1995) 

29 Elliott/ Aggoun/Moore, Hidden Markov Models: Estimation and Control (1995) 

30 Hernandez-Lerma/Lasserre, Discrete-Time Markov Control Processes (1995) 

31 Devroye/Gyorfi/Lugosi, A Probabilistic Theory of Pattern Recognition (1996) 

32 Maitra/Sudderth, Discrete Gambling and Stochastic Games (1996) 

33 Embrechts/Kliippelberg/Mikosch, Modelling Extremal Events for Insurance and Finance 
(1997) corrected 2nd printing 1999 

34 Duflo, Random Iterative Models (1997) 

35 Kushner/Yin, Stochastic Approximation Algorithms and Applications (1997) 

36 Musiela/Rutkowski, Martingale Methods in Financial Modelling (1997) 

37 Yin, Continuous-Time Markov Chains and Applications (1998) 

38 Dembo/Zeitouni, Large Deviations Techniques and Applications (1998) 

39 Karatzas, Methods of Mathematical Finance (1998) 

40 Fayolle/Iasnogorodski/Malyshev, Random Walks in the Quarter-Plane (1999) 

41 Aves/Jensen, Stochastic Models in Reliability (1999) 

42 Hernandez-Lerma/Lasserre, Further Topics on Discrete-Time Markov Control Processes 

(1999) 

43 Yong/Zhon, Stochastic Controls. Hamiltonian Systems and HJB Equations (1999) 

44 Serfozo, Introduction to Stochastic Networks (1999) 

45 Steele, Stochastic Calculus and Financial Applications (2000) 

46 Chen/Yao, Fundamentals of Queuing Networks: Performance, Asymptotics, and Opti- 
mization (2001) 

47 Kushner, Heavy Traffic Analysis of Controlled Queueing and Communications Networks 
(2001) 

48 Fernholz, Stochastic Portfolio Theory (2002) 

49 Kabanov/Pergamenshchikov, Two-scale Stochastic Systems (2002) 

50 Han, Information- Spectrum Methods in Information Theory (2002) 



Te Sun Han 



Information- 
Spectrum Methods 

in Information 
Theory 



Translated from the Japanese by Hiroki Koga 



Author 
Te Sun Han 

University of Electro-Communications 
Graduate School of Information Systems 
Chofugaoka 1-5-1 
182-8585 Tokyo, Japan 

e-mail: han@hn.is.uec.ac.jp 



Managing Editors 
B. Rozovskii 

Center for Applied Mathematical 
Sciences 

University of Southern California 
1042 West 36th Place, 

Denney Research Building 308 
Los Angeles, CA 90089, USA 



M. Yor 

Laboratoire de Probabilites 
Universite Pierre et Marie Curie 
16 rue Clisson 
F-75004 Paris, France 



Mathematics Subject Classification (2000): 60XX, 62XX, 94XX 

Originally published in Japanese by Baihukan, Publishers, Tokyo in 1998 
under the title 

JOHORIRON NI OKERU JOHO SUPEKUTORU TEKI HOHO 
Cover pattern by courtesy of Rick Durrett (Cornell University, Ithaca) 
Library of Congress Cataloging-in-Publication Data 
Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Han, Te Sun: 

Information spectrum method in information theory / Te Sun Han. - 



(Applications of mathematics ; 50) 

ISBN 978-3-642-07812-5 ISBN 978-3-662-12066-8 (eBook) 

DOI 10.1007/978-3-662-12066-8 

ISSN 0172-4568 

ISBN 978-3-642-07812-5 

This work is subject to copyright. All rights are reserved, whether the whole or part of the 
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, 
recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data 
banks. Duplication of this publication or parts thereof is permitted only under the provisions 
of the German Copyright Law of September 9, 1965, in its current version, and permission 
for use must always be obtained from Springer- Verlag Berlin Heidelberg GmbH. 

Violations are liable for prosecution under the German Copyright Law. 



http://www.springer.de 
© Springer-Verlag Berlin Heidelberg 2003 

Originally published by Springer-Verlag Berlin Heidelberg New York in 2003 

The use of general descriptive names, registered names, trademarks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from 
the relevant protective laws and regulations and therefore free for general use. 

Cover design: Erich Kirchner, Heidelberg 

Typesetting by the author using a springer TEX macro package 

Printed on acid-free paper SPIN: 10874930 41/3142db-543210 



To my parents and Okhee 



Preface to the English Edition^ 



Since information theory^ whose theoretical and mathematical core is cur- 
rently called Shannon Theory^ was initiated in 1948 by Shannon [77], five 
decades have elapsed. The advancement of information theory during this 
period is quite prominent and seems to have reached a kind of maturity. A 
huge amount of technical and sophisticated results have been reported in var- 
ious styles and viewpoints, which are not easy to follow for students and/or 
researchers of information theory, information technology and also of related 
fields such as communication theory, communication technology, statistics, 
computer sciences, and applied mathematics, etc. Therefore, in order for the 
fruits of Shannon theory to serve as a seed for new developments, a process 
of recapturing the existing results should be inevitable. 

On the other hand, in parallel with this advancement during the same 
period, many excellent textbooks of information theory have been published, 
typically, such as Fano [27], Pinsker [76], Abramson [1], Jelinek [55], Ash [6], 
Gallager [30], Wolfowitz [102], McEliece [64], Csiszar and Korner [19], Blahut 
[11], Cover and Thomas [17], Gray [31], Shields [81], and so on. One of the 
fundamental technical tools here is the asymptotic equipartition property 
or equivalently the notion of typical sequences, which effectively works on 
the basis of the weak law of large numbers. Nowadays these textbooks are 
regarded as providing the standard basis of information theory, in which we 
can find many simple and very elegant formulas, for example, those for source 
coding, channel coding, rate-distortion function, hypothesis testing, etc. 

However, almost all textbooks assume that both source X and channel W 
are stationary and/or ergodic (or memoryless) typically with finite alphabets. 
The above simplicity and elegance mainly come from these assumptions, while 
some few of the standard textbooks mention, rather as side issues, several 
special classes of sources and channels that are nonstationary or nonergodic. 
X 

The state of affairs as above tempted me to try to write this book, which 
is hopefully intended to put forth one step for constructing the skeleton of 
“a” unified general Shannon theory in which sources and channels may be 

^ This Preface is partly beised on joyful discussions with Hiroshi Nagaoka. 

^ For example, Gray [31] analyzes the details of “asymptotically mean stationary” 
sources (AMS), which are in general neither stationary nor ergodic. 



VIII Preface 



arbitrarily nonstationary and/or arbitrarily nonergodic with arbitrarily given 
abstract source/channel alphabets. Some people may say that such a tremen- 
dous generalization never leads us to any fertile results, or some other people 
may say that it leads us just to a boring mathematical overgeneralization. 

Nevertheless, our claim is that we can successfully reach a highly uncon- 
ventional but very fertile new world even under (or owing to) such a tremen- 
dous generalization. In other words, this book focuses on general nonstation- 
ary and/or nonergodic sources and channels, to establish a unified general 
treatment of a collection of mostly known results in the literature. This kind 
of treatment renders or is expected to render new insights into many prob- 
lems that are not obtainable otherwise, where the standard concept of typical 
sequences can be completely dispensed with. 

In order to construct such a general theory, we need to widen our view- 
point, which is called the information- spectrum method and, in addition, 
with any sequence (Zi, Z2, ^3, • • •) of real- valued random variables, we need 
to introduce the following two fundamental probabilistic limit operations, 
i.e., the limit superior in probability p-limsup^^^Q Zn defined by 

p- lim sup Zn = ini \ a \ lim Pr {Zn > a} 0 > , (1) 

n^oo ^ n-^00 J 

and the limit inferior in probability p-liminf^^oo defined by 

p- lim inf Zn = sup | [3 | lim Pr {Zn < /3} = o| . (2) 

n— >co I n^oo J 

These limit operations,^ to my best knowledge, have never appeared in 
any books on probability theory or information theory. Why is it? The reason 
would simply be that probability theorists (or information theorists) have 
never considered the situation in which such operations p- lim sup^.^^^ Zn and 
, Zn are needed. On the other hand, these novel limit operations 
Q Zn and p- lim inf „ 



^00 Zn are indeed indispensable and play the 



p- lim inf„. 
p- lim sup^ 

key roles in constructing general formulas in information theory and proving 
them. Then, the usual convergence in probability, denoted by p- lim^_>oo Zn 
— c, is equivalent to p- lim Zn — p- lim inf^-^oo Zn = c. It should be 

noted that p- lim sup^^^^ Zn and p- lim infn-,00 Zn are always well-defined in 
contrast with p-limn-^00 ^n- From this point of view, we may say that the 
standard textbooks as above are written basically under the key assumption 
that the limit p-limZ^ exists (i.e., the weak law of large numbers), where 

(n = 1, 2, . . .) is a sequence of random variables of information-theoretic 
nature. 

To get some insight into how these limit operations intervene in general 
formulas, let us consider about source coding theorems. The most simple 



^ This method was first devised by Han and Verdu [46] . 

^ In the special case where Zn = Zn with constants Zn {n = 
p- limsup^^^ Zn and p- liminfn^oo Zn coincide with limsup^^^ 
liminfn->oo respectively, in the usual real calculus. 



1 , 2 , 

Zn 



•••), 

and 



Preface 



IX 



case is a stationary memoryless source X — (Xi,X 2 , • • •) with finite source 
alphabet. In this case, we have 

Theorem 1. 

Rf{X) = H(X^), (3) 

where H{X{) is the entropy of X\ and Rf(X.) denotes the infimum of all 
achievable fixed-length coding rates. 

Next, if the source X = (Xi, X 2 , • • •) is stationary and ergodic with finite 
source alphabet X, Theorem 1 is generalized as follows: 

Theorem 2. 

Rf{X) = lim (4) 

n^oo 71 

where H{X^) is the entropy of X'^ = (Xi,X 2 , • • • ,Xn). 

Now, it would be natural to ask what happens if we consider the very general 
case in which the source X = (Xi,X 2 , • • •) is nonstationary or nonergodic. 
Some people may wonder if nothing to mention should exist in such an ex- 
tremely general case. Even in this case, however, we want to have the source 
coding theorem in parallel with Theorem 1 and Theorem 2. In other words, 
what quantity should come instead of the right-hand side of (3) or (4)? 

Let us show that this quantity cannot be expressed in terms of the tradi- 
tional entropies or entropy rates but can successfully be expressed with the 
use of the probabilistic limit operation p-limsup^^Q^ Z^'. Define ^ 

^(X) = p- lim sup Log ) , (5) 

n—^oo n ) 

which we call the spectral sup- entropy rate of the source X. Then, with count- 
ably infinite source alphabet X we have: 

Theorem 3. 

Rf{X) = H(X). (6) 

In view of full generality, this formula looks simple enough and very elegant 
to the author. It should be remarked here that the right-hand side of (4) 
includes the expectation operation, whereas the right-hand side of (6) does 
not include such an expectation operation. 

It is easy to check that, if the source X is stationary and ergodic, the right- 
hand side if(X) of (6) coincides with the right-hand side limn-.oo 
of (4), so that Theorem 1 and Theorem 2, with any countably infinite source 
alphabet (in place of finite source alphabet), follow as special cases from 
Theorem 3. 

Px” (•) denotes the probability distributions of X^. 



X 



Preface 



Once we have established the general source coding theorem (Theorem 
3), we can apply formula (6) to any source X. For example, we can apply (6) 
to mixed sources, which are well known to be stationary but nonergodic. If 
the source X is a mixed one of component sources Xi and X2, formula (6) 
yields 

Rf{X) = max(S^(Xi),:^(X2)). (7) 

Thus, we have illustrated in source coding what is the information- 
spectrum approach. 

It is possible to illustrate it also for channel coding. Let us first consider, 
as the most simple case, a stationary memoryless channel W = {^(-lO • 
T ^ T} with finite input-output alphabets. In this case, we have: 

Theorem 4. 

C(W)=sup/(X;y), (8) 

where C(W) is the channel capacity of W; X and Y are the channel input 
and the channel output ofY</; and I{X\Y) is the mutual information of X 
and Y. 

Next, if the channel W == is stationary and 

ergodic (or information stable) with finite input-output alphabets A',T, The- 
orem 4 is generalized as follows: 

Theorem 5. 

C(W) = lira sup -J(X";y"), (9) 

n^OO n 

where X'^ = (Xi, X2, • • • , is an input of block length n and Y'^ = 
(Yi, Y2, • • • 5 ^n) is the corresponding output of block length n, and /(X^; Y'^) 
is the mutual information of X^ and Y'^ . 

Now, let us consider the most general channel that is neither stationary nor 
ergodic with abstract alphabets Then, what quantity should come in- 

stead of the right-hand side of (8) or (9)? 

Let us show that this quantity cannot be expressed in terms of the 
traditional mutual informations or mutual information rates but can suc- 
cessfully be expressed with the use of the probabilistic limit operation 
p- lim infn^oo Zn- Define 

1 W'^(Y'^\X'^\ 

/(X; Y) = p- lim inf - log , , , (10) 

n-^oo n Pyr^iX^) 

which we call the spectral inf-mutual information rate of the channel W = 
with input X ^ and the output Y = Then, 

we have: 



Preface 



XI 



Theorem 6. 



C(W) -sup/(X;Y). 



( 11 ) 



X 



It should be remarked here too that the right-hand side of (9) includes the 
expectation operation, whereas the right-hand side of (11) does not include 
such an expectation operation. 

It is not difficult to check that, if the channel W is stationary and ergodic 
(or information stable), the right-hand side of (11) coincides with the right- 
hand side of (9), so that Theorem 4 and Theorem 5, with any abstract input- 
output alphabets (in place oi finite input-output alphabets), follow as special 
cases from Theorem 6. 

Once we have established the general channel coding theorem (Theorem 
6), we can apply formula (11) to any channel W. For example, we can apply 
(11) to mixed channels, which are well known to be stationary but nonergodic: 
if the channel W is a mixture of component channels Wi and W 2 , formula 
(11) yields 



where Yi, Y 2 are the outputs due to input X via Wi and W 2 , respectively. 

So far we have just browsed the information-spectrum approach to source 
coding and channel coding. It will turn out that this kind of information- 
spectrum approach applies as well to most of major subjects in informa- 
tion theory, including source coding (Chapter 1), random number generation 
(Chapter 2), channel coding (Chapter 3), hypothesis testing (Chapter 4), 
rate-distortion theory (Chapter 5), identification code and channel resolv- 
ability (Chapter 6), multi-terminal information theory (Chapter 7). In each 
chapter we have tried to recapitulate standard theorems and extend them 
so as to be as general as possible. The reason why, unlike in the standard 
textbooks, we have included Chapter 2 and Chapter 6 is that we wanted 
to highlight their intrinsic information-theoretical nature as thoroughly as 
possible. 

We now have to elucidate, from the methodological point of view, how 
the information-spectrum approach differs from the traditional approach. 

Information theory can be primarily regarded as a discipline that tries 
to link two different kinds of quantities: one is operational quantities defined 
by using operational concepts such as encoder, decoder, codeword length, 
compression rate, transmission rate, probability of error (or convergence rate 
of error probability) and so on; whereas the other is information-theoretic 
quantities such as entropy, divergence, mutual information, and so on. ** 

** For example, Gray and Ornstein [34, p. 294] argue that “A principal goal of the 
Shannon Theory is to prove coding theorems relating such operational capaci- 



C(W) =supmin(I(X;Yi)),I(X;Y 2 )) 

X 



( 12 ) 



XII Preface 



Theorems in the traditional information theory are usually written in the 



form: 

a = 7 , (13) 

where a is an operational quantity and 7 is an information-theoretic quantity. 
For example, in Theorem 1 

a = Rf{X), j = H{Xiy, (14) 

and in Theorem 2 

a = Rf{X), 7 = lim If(X”). (15) 

n— >oo n 

Furthermore, in Theorem 4 

a = C(W), 7 = max/(X;y); (16) 

and in Theorem 5 

a = C(W), 7 = lim sup (17) 

n— ^oo U 



On the other hand, the core of the information-spectrum methods is to in- 
tentionally partition the logical process of establishing the link a = 7 into two 
parts, where a third kind of quantities (called information- spectrum quan- 
tities, P^s) intervene between operational quantities, a’s and information- 
theoretic quantities, 7 ’s. For example. Theorem 3 is written as 

a = p (18) 

with a = RfOQ and p = and Theorem 6 is written as 

a = p (19) 

with a = C(W) and P = supxI(X; Y), respectively. Here, both of ^ = i?(X) 
and P = supxZ(X; Y) are information-spectrum quantities. 

Now, we partition the proof of the theorem a = 7 as in (13) into two 
parts: 

a = P and P — ( 20 ) 

where P is an information-spectrum quantity. 

The pleasing advantage of considering such a partition of the proof, in- 
stead of the usual direct proof, is that the first formula a = p holds in full 
generality without any assumptions about information-theoretic coding sys- 
tems and also that the formula a = P can be written in a very simple unified 
form with any abstract source/channel alphabets. 



ties to information- theoretic extremum problems; that is, to quantities involving 
extreme values over all probabilistic functions or, equivalently, extremizing some 
functional of probability measures over an appropriately constrained space.” 



Preface XIII 



This is because we are using the information- spectrum language. Al- 
though, seemingly, simplicity may contradict generality^ it is revealed that, in 
the world of information-spectra, the key cores of mathematical (or logical) 
arguments that have ever traditionally been developed in information theory 
are actually very simple when we pursue generality as thoroughly as possible. 

Another remarkable feature of the information-spectrum methods is that 
all the information-theoretic coding aspects are pertinently processed and 
completely taken into the formula a = j3. On the other hand, the second for- 
mula /3 = 7 is entirely of probability-theoretic or statistical nature and has 
nothing to do with information-theoretic coding aspects, the proof of which 
depends on what information-theoretic coding systems we are considering. 
Thus, with the general formula a = (3 established, the proof of the origi- 
nal formula a — ^ entirely boils down to how to show or compute /? = 7, 
depending on respective individual systems. 

It should be emphasized here that the degree of generality of the for- 
mula a = /3 is far beyond that of the formula /3 = 7. To see this, suppose 
that we have several information sources , ^2 , • • • , Sm with different source 
statistics. Now, suppose that we have the following m theorems: 

a = 71, a = 72, • • • q; = 7m, (21) 

which are usually proved in respective ways depending not only on the source 
statistics but also on respective coding schemes. 

However, with the general formula a — (3 established, it suffices to prove, 
instead of (21), the following probabilistic formulas: 

/3 = 71 , = 72 , • • • P = 7m- (22) 

In principle, the proof of (22) is supposed to be much easier and much more 
transparent than that of (21), because the computation needed to prove (22) 
does not contain the coding-theoretic aspects. In other words, the proof of 
(22) needs only respective probabilistic arguments. 

The main purpose of this book is primarily to establish general information- 
spectrum formulas of the form a = /^ for most of major subjects in infor- 
mation theory, where it will turn out that the language and the techniques 
(typically, including the technique of information- spectrum slicing) of the 
information-spectrum methods are indeed fit to formulate and prove many 
theorems of information theory in the most general forms. 

Although this book would surely be the first attempt to apply the 
information-spectrum methods to most major subjects of information the- 
ory in a systematic and comprehensive manner, the description as well as 
the logics of the book is elementary and self-contained, so that just little 
knowledge on information theory is required in advance, except for rather 
advanced several topics. The style of the book is quite unconventional and 
different from that of any other textbook of information theory, while the 



XIV Preface 



theme of the book is orthodox in the sense that the way of problem setting 
is completely faithful to the tradition of information theory. 

This book contains a considerable number of historical remarks together 
with a rather extensive list of references, though far from complete. 

Information-spectrum methods are still very immature and need to be 
cultivated much further. For example, specific nonstationary or nonergodic 
examples that we were able to demonstrate in this book are, in most cases, 
mixed sources and mixed channels. This is because mixed sources and mixed 
channels are very simple but typically nonergodic, while, regrettably enough, 
we are not yet successful in demonstrating another type of much more sub- 
stantially nonstationary and/or nonergodic sources and/or channels. This 
problem remains to be exploited further also along the information-spectrum 
methods. 

However, I believe that the reader of this book would become more or less 
significantly familiar with both technical details and conceptual perspectives 
of information theory that are outside the world of stationarity and ergodicity. 



Acknowledgments 

I would like to express my sincere gratitude to all the people who, in one 
way or another, helped me publish this book. First of all, I should heartily 
thank Prof. Emeritus Shun-ichi Amari who not only has constantly encour- 
aged me toward research activities but also provided me with the chance to 
publish this English edition. I thank very much Prof. Shunske Ihara who has 
read through the Japanese edition of this book, which has occasioned to re- 
fine the description of Chapter 5. Special thanks should go to Prof. Hiroshi 
Nagaoka. Discussions with him were always not only pleasant but also stimu- 
lating. My cordial thanks also should go to Prof. Kingo Kobayashi and Prof. 
Mamoru Hoshi for their warm-hearted support. 

I am very grateful to Prof. Hiroki Koga who translated the original 
Japanese edition into the present English edition. His excellent translation is 
greatly appreciated. Finally, many thanks go to Dr. Mitsuharu Arimura for 
kindly keeping my computer facilities working well. 



Te Sun HAN 
April 7, 2002 



Table of Contents 



1 Source Coding 1 

1.1 Source Coding: Fixed-Length Codes 1 

1.2 Source Coding: Variable-Length Codes 8 

1.3 Coding for General Sources: Fixed-Length Codes 14 

1.4 Fixed-Length Coding for Mixed Sources 20 

1.5 Strong Converse Theorem for Source Coding 34 

1.6 e-Source Coding 38 

1.7 Coding for General Sources: Variable-Length Godes 43 

1.8 Coding for General Source: Weak Variable-Length Godes ... 51 

1.9 Source Goding and Large Deviation: Decoding Error 

Probability 63 

1.10 Source Goding and Large Deviation: Probability of Gorrect 

Decoding 85 

1.11 Reliability Functions of the General Source with 

Variable-Length Goding 96 

1.12 Information Spectrum and Invariancy 97 

2 Random Number Generation 103 

2.1 Random Number Generation 103 

2.2 Resolvability and Intrinsic Randomness 118 

2.3 Strong Converse Theorem for Random Number Generation . . 124 

2.4 J-Random Number Generation 128 

2.5 Variable-Length Intrinsic Randomness 141 

2.6 Random Number Generation and Source Goding 154 

3 Channel Coding 169 

3.1 Ghannel Goding: Stationary Memoryless Ghannel 169 

3.2 Goding for General Channel 176 

3.3 Coding for Mixed Channels 190 

3.4 e-Channel Coding 210 

3.5 Strong Converse Theorem on Channel Coding 218 

3.6 Channel Capacity with Cost Constraint 225 

3.7 Strong Converse Property of Channel with Cost Constraint . 232 

3.8 Joint Source-Channel Coding 247 

3.9 Separation Theorems of the Traditional Type 263 



XVI Table of Contents 



4 Hypothesis Testing 269 

4.1 Hypothesis Testing 269 

4.2 e-Hypothesis Testing 279 

4.3 Strong Converse Theorem for Hypothesis Testing 283 

4.4 Hypothesis Testing and Large Deviation Probability of 

Testing Error 286 

4.5 Hypothesis Testing and Large Deviation: Probability of 

Correct Testing 308 

4.6 Generalized Hypothesis Testing 319 

4.7 Hypothesis Testing and Source Coding 320 

5 Rate-Distortion Theory 325 

5.1 Coding Subject to Distortion Criterion 325 

5.2 Rate-Distortion Theory for Stationary Memoryless Sources . . 325 

5.3 General Rate-Distortion Theory 333 

5.4 Rate-Distortion Function Rfm{D\X.) 340 

5.5 Rate-Distortion Function Rfa{D\X.) 343 

5.6 Rate-Distortion Function Rym{D\X.) 348 

5.7 Rate-Distortion Function R^a(D|X) 357 

5.8 Rate-Distortion for Stationary Memoryless Sources Revisited 369 

5.9 Rate-Distortion for Stationary Ergodic Sources 376 

5.10 Rate-Distortion Function for Mixed Sources 380 

6 Identification Code and Channel Resolvability 395 

6.1 Identification Code and Channel Resolvability 395 

6.2 Identification Coding 396 

6.3 Channel Resolvability 404 

6.4 Identification Capacity Theorem and Channel Resolvability 

Theorem 413 

6.5 Identification Capacity with Cost Constraint 421 

6.6 Channel Resolvability with Cost Constraint 422 

6.7 Identification Capacity and Resolvability of Continuous 

Input Channels 425 

6.8 Identification-Transmission Codes 435 

7 Multi- Terminal Information Theory 453 

7.1 What Is Multi- Terminal Information Theory? 453 

7.2 The Slepian-Wolf Source Coding System 454 

7.3 Slepian-Wolf Source Coding for Mixed Sources 464 

7.4 e-Source Coding for Slepian-Wolf Source Coding System .... 471 

7.5 Strong Converse Theorem for Slepian-Wolf Source Coding 

System 477 

7.6 Multiple-Access Channel Coding Systems 481 

7.7 General Capacity Region Theorem for Multiple- Access 

Channels 483 



Table of Contents XVII 



7.8 Stationary Memoryless Multiple- Access Channels 487 

7.9 Mixed Multiple- Access Channels 495 

7.10 Proof of Theorem 7.7.1 500 

7.11 e-Coding for Multiple- Access Channel 509 

7.12 Strong Converse Theorem for Multiple- Access Channels .... 513 

7.13 Multiple- Access Channels with Cost Constraint 520 

References 527 

Index 533 



1 Source Coding 



1.1 Source Coding: Fixed-Length Codes 

At the beginning of this chapter, we describe source coding of a special but 
very important source X, that is, we consider the source coding of a stationary 
memoryless source 

X = (Xi,X2,---)- 

A stochastic process (Ai, X 2 , • • ■) is called the stationary memoryless source if 
Ai, A 2 , • • • are independently generated according to an identical probability 
distribution. Here, the set in which random variables A^ (i = 1,2, • • •) take 
values is denoted by A and called the source alphabet. We assume that A is a 
finite set in this section and the following section. Since giving a probability 
distribution of X± is enough to specify such a stationary memoryless source, 
we often set A = Ai and simply call A the stationary memoryless source. 

We formulate the fixed- length source coding problem as follows. First, 
for each n = 1, 2, ■ • • we define a set of integers Mn = {1, 2, • • • , M^} called 
the code. The operation that transforms an output x G A’^ of length n to 
an element i G Mn of the code is called the (block) coding. The mapping 
LPn • Mn denoting this transform is called the encoding function or 

the encoder. We call the codeword of x. A receiver who received the 

codeword i = tries to reproduce its corresponding source output x. 

This operation is called the decoding and the mapping 'ipn • Mn A^ 
denoting this transform is called the decoding function or the decoder (see 
Fig. 1.1). 



source 




Fig. 1.1. 



2 



1 Source Coding 



We take the standpoint that the decoded sequence is not always equal 
to its original sequence, i.e., the decoded sequence should coincide with its 
original in almost all cases. This standpoint motivates us to introduce mea- 
sures that objectively represent performance of source coding. We introduce 
the following two notions. One is the error probability and the other is the 
coding rate. First, we define the error probability by 

£„=Pr{X"^V’n(V^n(^”))}, 

where is the random variable denoting an output se- 

quence of length n from a source. In general, the smaller error probability 
means the better source coding. However, in order to make the error proba- 
bility small, we need to use a code Mn with a larger size This motivates 
us to define a quality that quantitatively represents the size of the code. We 
define the quantity 

Tn = - log Mn 
n 

and call it the coding rate (hereafter, log means the natural logarithm unless 
stated otherwise). The coding rate means the amount of information, logM^, 
used for encoding of a source output of length n per source letter. From the 
viewpoint of transmitting a codeword to a receiver, the smaller coding rate 
means the more efficient encoding. However, there is a tradeoff between the 
coding rate and the error probability, i.e., the smaller coding rate may cause 
the larger error probability. Consequently, source coding problems are usually 
formulated as finding the minimum coding rate such that there exists a pair 
of an encoder (pn £^nd a decoder il^n with the error probability Sn less than a 
given constant. 

In order to clearly describe such source coding problems, we call a pair 
of an encoder and a decoder (pn^'^n) with a code Ain of size Mn and the 
error probability Sn the (n, £n)-code. First, we consider the constraint 
that Sn goes to zero as the block length n goes to infinity. The problem in 
which we are interested is how we can make the coding rate small subject to 
this constraint. This leads to the following definition. Here, note that “rate” 
means a nonnegative real number throughout this book. 

Definition 1.1.1. 

def 

Rate R is achievable <=> There exists an (n, M^, £n)-code satisfying 

lim Sn — ^ and lim sup — log Mn < R- 

n >oo n — >oo 

Definition 1.1.2 (Infimum achievable fixed- length coding rate). 

Rf(X.) = inf {R I R is achievable} . 

The first problem to be considered in this chapter is finding this jR/(X). 
Before we give an answer to this question, we introduce the following notation. 



1.1 Source Coding: Fixed-Length Codes 



3 



Let Pz denote the probability distribution of a random variable Z taking 
values in a set Z. That is, Pz{z) — Vy{Z = z] (z e Z). 

By using this notation, define the entropy H{X) of a stationary memory- 
less source X by 

H{X) = -Y,Px{x)\ogPx{x), 

where OlogO is defined as 0. The entropy H{X) is also denoted by H{Px), 
where Px denotes the probability distribution of X. Clearly, H{X) > 0. In 
addition, H{X) < loglA^I holds (cf. Cover and Thomas [17]). 

Then, we have the following theorem: 



Theorem 1.1.1 (Shannon [77]). 
Rf{X) = H{X). 



Proof. 

1 ) Direct part: Rf(X.) < H{X) 

Set X^ = (Xi, • • • , Xn) and for an arbitrarily small 7 > 0 define the set 
Tn by 



X G X" 






-H{X) 



< 7 



where elements of are called the typical sequences of a stationary mem- 
oryless source X. Since the source is stationary and memoryless, for any 
X = (xi, • • • , Xn) G we have 



■Px" (x) = Px{xi) ■ ■ ■ Px{Xn) 



( 1 . 1 . 1 ) 



and therefore 



1 ^ 

= - ^ log 

n. ^ ^ 



Px{x,y 



Since the terms in the sum of the right-hand side are independent and subject 
to an identical probability distribution, all of their expectations are equal to 
the entropy H(X). In addition, it is easy to verify that their variances are 
finite from the assumption that the source alphabet is finite. Then, Cheby- 
shev’s inequality (see Remark 1 . 1 . 1 ) yields 



Pr G Tn} -^1 as n ^ 00 
(the weak law of large numbers). 



( 1 . 1 . 2 ) 



Remark 1.1.1. Let Z G X be an arbitrary random variable and denote by 
/i and the expectation and the variance of Z. Then, for any a > 0 it holds 
that 

Fv{\Z - ia\> aa} < 



(1.1.3) 



4 



1 Source Coding 



This inequality is called Chebyshev’s inequality. By setting ji = H(X), 
a — and Z — — log — — y — - and rewriting (1.1.3) as 

Pt{\Z - fJ.\ < aa} > 1 - 

^ a/ 77/^ 

we obtain ( 1 . 1 . 2 ), where a — — — and ctq denotes the variance of 

CF (Jo 



log 



Px{X) 



4W, 



that is bounded (more precisely, (Jn < — 5 —: see Remark 3.1.1 in Chapter 3). 

We can obtain Chebyshev’s inequality as a special case of Markov’s in- 
equality. Here, Markov’s inequality means the inequality 



Pr {Z > aji} < 



(1.1.4) 



for any a > 0, where Z is an arbitrary nonnegative random variable with 
mean fi. This inequality is proved in the following way. Define S = {z ^ Z \ 
z > a/i}. Then, we have 



^ zPz{z) 

= Y^zPz{z) + Y^zPz{z) 
zes z^s 

> a/j.'^Pz{z) = a/j.Pz{S), 

zes 

which implies Pz(S) < Thus (1.1.4) is established. 

By using {Z — /i)^ and instead of Z and a, respectively, in Markov’s 
inequality (1.1.4), we obtain Chebyshev’s inequality (1.1.3). □ 



Now, we return to the proof of the direct part. Since (x) > e 
for all X G Tn, it follows that 

1 > "Z 

xeT„ 

g-n(ff(X)+7) 

x€T„, 

which leads to |T„| < If we set M„, the size of the code, as 

Mn = , we can define an encoding function (pn : Mn = 




1.1 Source Coding: Fixed-Length Codes 



5 



{1, 2, • • • , Mn} that is one-to-one from Tn to Mn and maps all elements not 
belonging to Tn to 1. We define a decoding function '0^ • Mn as 

the mapping satisfying 'ipni'^) = x. if i — (pn{x) for some x G T^. That is, 
'ifjn is the inverse of v^n|T„, • Such decoding is obviously well defined. Since 
X = '0n(v^n((x)) holds for X G T^, we can decode all the elements of Tn 
correctly. Therefore, the error probability can be evaluated as 

en = Pt {x^ 

<Pr{X"^r„}. 

In view of (1.1.2), we have lim = 0. On the other hand, since Mn = 

n^oo 

|-gn(/f(X)+7)-|^ we clearly obtain 
lim sup — log Mn < if (X) -h 7. 

n— »oo 

By recalling that 7 > 0 is arbitrary, we can conclude that any rate R satisfying 
R> H{X) is achievable. 

Remark 1.1.2. In the proof above we have seen that Mn = , 

Such values rounded to integers will often appear in this book. However, as far 
as there is no confusion, we will omit the rounding up operation [ • ] and the 
rounding off operation [ • J and express them in the form of Mn = 
for simplicity. □ 



2) Converse part: R/(X) > H[X) 

We prove the converse part by using the Fano inequality given below. 
Denote by H{X\Y) the conditional entropy of X given Y defined by 

H{X\Y) = - ^ ^Pxy(a^,y)logPx|y(x|y), 

xexyey 

where Px\y{'^\y) denotes the conditional probability of X = x for given 
Y = y. 

Lemma 1.1.1 (Fano inequality [26]). Let X andY be arbitrary random 
variables taking values in finite sets X and y satisfying X = y. Define e — 
Pr{X 7^ T}. Then, we have 

H{X\Y)<e\og{\X\-l)Xh{e), 

where h{x) means the binary entropy defined by 

h{x) = —X log X — {1 — x) log(l — x) (0 < X < 1). 



6 



1 Source Coding 



Proof. Define Z as the random variable that is equal to 0 if X = y and 1 
otherwise. From the chain rule of the entropy (e.g., Cover and Thomas [17]) 
we have 

H{XZ\Y) = H{X\Y) + H{Z\XY) = H{X\Y). (1.1.5) 

Here, notice that H{Z\XY) = 0 since Z is a function of X and Y. On the 
other hand, we also have 

H{XZ\Y) 

= H{Z\Y)+H{X\ZY) 

< H{Z)-^H{X\ZY) 

= h{e) + H{X\ZY). (1.1.6) 

We notice here that H{X\ZY) is further evaluated in the following way: 
H{X\ZY) 

= Pr {Z - 0} H{X\Y, Z = 0) + Pr {Z - 1} H{X\Y, Z = 1) 

= sH{X\Y, Z = 1) < slog(|Tl - 1), (1.1.7) 

where H{X\Y, Z — 0) — 0 is used since X is uniquely determined from Y for 
the case of Z = 0. In addition, the inequality in (1.1.7) is obtained from the 
fact that, for the case of Z = 1, X can take |X| — 1 values different from Y. 
By summarizing (1.1.5)-(1.1.7), we obtain the Fano inequality. □ 



Now, we are ready to prove the converse part. Fix an arbitrary (n, Mn,Sn)~ 
code and define X'^ = Then, we have 

\ogMn > H{cp^{X^)) > 

= H{X^) >/(X^;X^) 

= H{X^)-H{X^\X^) 

= ni7(X)-i7(X^|X^), 

where I{X^]X'^) denotes the mutual information between X'^ and X’^ (see 
§3.1 in Chapter 3: cf. Cover and Thomas [17]) and we use H{X'^) = nH{X) 
from the assumption that the source is stationary and memoryless. If we 
apply the Fano inequality with using X^ and X^ instead of X and T, it 
follows that 

iJ(X-|X") < log(|Xr - 1) + h{Sn) 

< n£„log|A’| + h{en), 

where £„ = Pr{X” ^ X”} denotes the error probability. Consequently, we 
have 

log Mn > nH (X) - n£„ log | - h(£„), 



1.1 Source Coding: Fixed-Length Codes 



7 



i.e., 

-logMn>H{X)-Snlog\X\-f^. (1.1.8) 

n n 

Now, suppose that a rate R is achievable. Notice that 
lim €n = 0 and lim sup — log < R 

n-^oo n^oo 

from the definition. By taking lim sup of both sides of (1.1.8), we can con- 

n—^oo 

elude that R > H (X) . That is, any achievable rate R cannot be less than the 
entropy. □ 



Remark 1.1.3. Many theorems in information theory are described in the 
form of “A — R,” where A is a quantity with an operational meaning that 
characterizes performance of encoding and B a quantity without such an 
operational meaning. In order to establish theorems claiming = R,” we 
always prove the two inequalities claiming “A > J5” and “74 < R,” respec- 
tively. One of the inequality is called the direct part and the other is called 
the converse part It depends on contexts of the theorems which of the two 
is called the direct part and which of the two is called the converse part. 
However, in general, the inequality showing that an operation is possible up 
to some quantity (i.e., showing that the quantity is achievable) is the direct 
part and the inequality showing that the operation becomes impossible over 
the quantity (showing that the quantity is not achievable) is the converse 
part. □ 



Remark 1.1.4. In the proof 1) above, the set of the typical sequences is 
denoted by T^. The ratio of the cardinality of Tn to the cardinality of is 
evaluated as 

\rn I „n(//(X)+ 7 ) 

^ — p-n(log|;fc’|-i^(X)-7) /-j -1 qn 

\X\n - lA”!- “ ■ 

If H{X) < logjA'I is satisfied, we can choose a sufficiently small 7 > 0 
satisfying logjT'l — H{X) — 7 > 0. This means that the rightmost side in 
(1.1.9) goes to 0 as n — > 00 . That is, in the asymptotic sense of n being 
sufficiently large, Tn can be negligible in X'^ like dust from the viewpoint of 
the size of sets. It is quite interesting to notice, however, that this Tn has the 
property that Pr G Tn} ~ 1 as n ^ 00 . That is, almost all probability 
is concentrated on the dust Tn (Fig- 1-2) if n is sufficiently large. □ 



Remark 1.1.5. The code used in this section is called the fixed-length code. 
We can give the reason why the code is called the fixed-length code. In the 



8 



1 Source Coding 



Fig. 1.2. 




proof of the direct part a codeword i G Mn is an integer. In order to send i to 
the decoder, we translate the codeword i into a string. Suppose that letters 
included in the string take values in a code alphabet U satisfying \U\ = K. If 
we set the length of the string to the fixed length [log^^ , we can obtain 
a one-to-one correspondence between the code Adn of size and strings of 
length \\ogj^ Mn]. The length of codewords per source letter is written as 

1 1 
~ ^n] — ~ logic = 'f'm 

which is nothing but the coding rate using K as the base of the logarithm. □ 



1.2 Source Coding: Variable-Length Codes 

In this section we assume that a source X = (Xi,X 2 , • ■ ■) is stationary and 
memoryless as is assumed in the preceding section. We use the notations 
such as X^ = (Xi, • • • ,Xn) in the same sense. Though the theorem given 
in this section holds even if A:' is a countably infinite source alphabet, we 
assume that A is a finite set for simplicity. We consider the encoding that 
transforms x G an output sequence from the source of length n, into 
a string each letter of which takes values in a set Z// = {0, 1, 2, • • • , — 1} 

called the code alphabet^ where K is an arbitrary integer satisfying K > 2. 

If the length of the string depends on x, we call such encoding variable- 
length coding. More strictly, the encoding is defined as an encoding function 
(encoder) g)n A’^ ^ ZY*, where W denotes all the strings of finite length 
over U excluding the null string A. We call the codeword of x. On 

the other hand, the operation that a receiver, who receives u = (/?^(x), uses 
to reproduce the original source output from u is called the decoding. The 
operation is defined as a decoding function (decoder) -0^ : ^ A^. The 

fixed-length coding treated in the preceding section can be considered as a 
special case of the variable-length coding whose all codewords have the same 
length (cf. Remark 1.1.5). 



1.2 Source Coding: Variable-Length Codes 



9 



There are various kinds of variable-length codes. However, since the objec- 
tive of this section is understanding of basic ideas on variable- length codes, 
we focus on a class of the prefix codes that is one of the most fundamental 
classes of variable-length codes. Here, a set of codewords C = {ui, • • • , um} 
is called a prefix code if it has a property that an arbitrary codeword is 
not a prefix of any other codewords {j ^ i). 

Example 1.2.1. A binary string 0101 is a prefix of 0101001 but not a prefix 
of 01001. In addition, 0101 is a prefix of 0101 itself. □ 



Hereafter, we only consider the case that Cn = {^n(x)}x€A’^ constitutes 
a prefix code. We call the encoding using a prefix code the prefix coding. In 
the framework of the prefix coding, we usually consider only a pair '0^) 
of an encoder and a decoder with the error probability Sn equal to 0, that is, 
the pair satisfying the condition 

= Pr {A- ^ V^n(^n(X"))} = 0 (Vn = 1, 2, . • 

which is not required in the case of the fixed-length coding described in the 
preceding section. This condition immediately implies that (pn and must 
be one-to-one mappings. 

In the fixed-length coding treated in the preceding section the coding rate 
plays an important role. In the variable-length coding, however, the average 
codeword length per source letter 

n 

plays a fundamental role, where |u| denotes the length of a sequence u and 
E|(/p^(A^)| means the expectation of the codeword length. This quantity is 
called the coding rate of the variable-length coding. The basic problem of the 
variable-length coding is making this coding rate as small as possible. This 
problem is formally described in the following way: 

Definition 1.2.1. 

Rate R is achievable There exists a prefix code satisfying 

limsup — E[(^^(A^)| < R 

n^oo U/ 

Definition 1.2.2 (Infimum achievable variable-length coding rate). 



Rv{X.) = inf {R I R is achievable} . 

Under the definitions above we have the following theorem corresponding 
to Theorem 1.1.1 in the preceding section. Note that the subscript “AT” of 
Hk{X) denotes the base of logarithms. 



10 



1 Source Coding 



Theorem 1.2.1 (Shannon [77]). 

R,{X) = Hk{X). 

Proof. 

1) Direct part: RyfX.) < Hk{X) 

First of all, for a sequence oi u e U* define a subinterval 7(u) of width 
in the unit interval [0, 1) by 

7(u) = [O.U 1 U 2 • • ■ Um, O.U 1 U 2 --Um-h 77“^), (1.2.1) 

where u = U 1 U 2 • • • 'Um and the left and the right endpoints of 7(u) is ex- 
pressed in the 77-ary form. Then, we have a one-to-one correspondence be- 
tween u and 7(u). Such an interval is called a 77-ary interval and m is called 
the length of the 77-ary interval 7(u). It is easy to check that the following 
lemma holds. 

Lemma 1.2.1. A code C — {ui, • • • ,um} (^i C U*) is a prefix code if and 
only if the K-ary intervals 7(ui), • • • , 7 (um) corresponding to ui, • • • , um , 
respectively, are disjoint. □ 



Now, define Mn = |T|^ and let 

be the list of all elements in Define the cumulative probability by 

Pi = = 

where Pi is defined as 0. In addition, define 

Qi = Pi-\-^Px-{^i) (i = l,-*-,Mn). (1-2.2) 

For each x^ let 7(ui) be the 77-ary intervals in [0, 1) with the minimum length 
containing Qi but containing neither Pi nor P^_^i (Fig. 1.3). Since 7(u^) C [Pi, 
Pi^i) clearly holds, all of 7(ui), • • ■ , 7 (um„) are disjoint. Hence, Lemma 1.2.1 
guarantees that C — {ui, ■ • • , } (u^ G IP) constitutes a prefix code. We 

now define an encoding function U* by = (/?^(x^). On the 

other hand, we define u' = sequence obtained from 

Ui = by removing the last letter of u^.. Then, since 7(u^ 

contains 7(u^), we have Qi G 7(u^- Furthermore, at least either Pi G 7(uQ or 
Pi-\-i C 7(u') must hold from the definition of 7(u^). Since in either case the 
width of the interval 7(u') is greater than the half of — Pi = 

Px^^(x-j), it holds that 

— ^ (Xj) 



2 



1.2 Source Coding: Variable-Length Codes 



11 



- Pi+i T 



Fig. 1.3. 







- Pi, ^ 



That is, we have 

|uj| =mi < - log^ Px" (Xj) + 1 + log^ 2 

< -log;^Px"(xi) + 2, (1.2.3) 

where we use the assumption that K > 2. Therefore, the average codeword 
length is evaluated as follows: 

E|^„(X”)| < - ^ Px" logx Px'. (x) + 2 = Hk{X^) + 2. 

Here, notice that (1-1.1) in the preceding section holds since we assume that 
the source is stationary and memoryless. By substituting (1-1-1) into the 
right-hand side of the equation above, it follows that 

< Hk{X) + 

n n 

Consequently, 

limsup -E\^n{X^)\ < Hk{X), 

n—^oo 

which guarantees that Hk(X) is achievable. 

Remark 1.2.1. We can construct the prefix code used in the proof by defin- 
ing the codeword for as the X-ary expansion of Qi (1-2.2) of length 
li, where li = f— log^^ (x^)] + 1- This code is called the Shannon-Fano- 
Elias code. We can easily verify that the constructed code forms a prefix code 
by checking that the K-aiy interval /(u^) corresponding to is included in 
[P^,P^-l_i) (cf. Cover and Thomas [17]). It is clear that this codeword length 
satisfies inequality (1.2.3) □ 



2) Converse part: RvpQ > Hk{X) 

We prove the converse part by using the following two inequalities. 



12 



1 Source Coding 



Lemma 1.2.2 (Kraft inequality [59], [65]). Let C = {ui, G 

W) be a prefix code and define U = |u^|. Then, 

M 

(1.2.4) 

i=l 

Proof. From Lemma 1.2.1 K-ary intervals /(u^) (i = 1, • • • ,M) are disjoint. 
In addition, since all of intervals /(u^) lie in the unit interval [0, 1), the sum 
of their widths with respect to all i cannot be greater than 1. □ 



Lemma 1.2.3 (Log-sum inequality: Csiszar and Korner [19]). For any 

Pi >0 and Qi > 0 (i = 1,2, ■ ■ ■ , M), we have 



M 

^Pilog— >plog~, 



(1.2.5) 



M 



M 



where p = ^ ~ ^ T- Here, we use the convention that Olog t = 0 

i=l i=l 

Q 

(b >0) and clog - = +oo (c> 0). 

Proof. Set a = p/q. Since substitution of x = aqi/pi into the inequality 
log X < X — 1 (x > 0) yields 

1 aqi . aqi 

log — < 1, 

Pi Pi 

by multiplying both sides by pi and summing with respect to i it holds that 
M 

'Y^Pi log — < aq - p = 0. 

Thus, we obtain 

M 

^Pilog— > ploga, 

i=i 



M 



y'pilog— > piog-. 

^ 

Remark 1.2.2. 11 p = q — 1, P and Q defined as P = (pi) and Q — (q^) 
correspond to probability distributions. We define the divergence between P 
and Q by 



M 



D{P\\Q) = 

Qi 



i=l 



1.2 Source Coding: Variable-Length Codes 



13 



which satisfies D{P\\Q) > 0 from Lemma 1.2.3. As we can see from the 
proof of Lemma 1.2.3, D{P\\Q) = 0 if and only li P — Q. The divergence 
D{P\\Q) is denoted by D{X\\Y) as well, where X and Y denote the random 
variables subject to the probability distributions P and Q, respectively. The 
divergence, as well as the entropy, plays a crucial role in the framework of 
information theory. □ 



Now, we are ready to prove the converse part. Suppose that a pre- 
fix code is arbitrarily given. We use the notations of X'^ = 

{xi, • ■ • , xm„,} and k — \ipn{'^i)\ foi’ fh® convenience of the proof. Setting 
qi = we obtain 

Mr, 

q = Y.qi<l (1.2.6) 

i=l 



Mr, 

from Lemma 1.2.2. By defining pi = Px^ (xi) and p = and applying 

i=l 

Lemma 1.2.3 to (1.2.6), it follows that 



IVl 77 ^ 

VpilogK — = logK - ^ 0- (1.2.7) 

.1 Q.i 0 . 0 . 

1=1 

If we rewrite the right-hand side of this equation as 

Mr, Mr, Mr,. 

Pi log^ — = - ^ Pi log^ qi + '^Pi log ;^ Pi 
i=i i=i i=i 

and substitute into it, we have 

Mr, Mr, Mr, 

Y — = Ypi^i^Ypi Pi 

i=i i=i i=i 

= E|p„(X”)|-if^(X") 

= E\MXn\-nHK{X), 

where Hk{X^) = tiHk{X) follows from the assumption that the source is 
stationary and memoryless. Therefore, (1.2.7) leads to 

n^u{X^)\>nHK{X). 

Therefore we obtain 

limsuplE|v:.„(X”)| >Fk(X), 

n— >oo 

which guarantees that any achievable rate cannot be less than Hk{X). □ 



14 



1 Source Coding 



1.3 Coding for General Sources: Fixed-Length Codes 



In this section we generalize the stationary memoryless sources treated in 
§1.1 and §1.2. Actually, we will consider quite a wide class of sources. Let 

x= = (1.3.1) 

denote a general source^ where X'^ is a random variable over the n-th Carte- 
sian product of an arbitrary source alphabet X {X is not restricted a 
finite set) subject to an arbitrary probability distribution Px ^ . Some general 
sources satisfy the consistency condition 

for any m < n. These are what we usually call the stochastic processes. If the 
source X satisfies the consistency condition, (1.3.1) can be simply written as 

X = {X" = (Xi,X2,---,X„)}~i 



or 



Since the general sources considered here are not required to satisfy the 
consistency condition, they contain various sources; it contains all of nonsta- 
tionary and/or nonergodic sources. 

Here, we introduce new notations that play fundamental roles in all discus- 
sions throughout this book. For an arbitrary sequence of real- valued random 
variables we define the following notions (cf. Han and Verdu [46]): 



Definition 1.3.1 (Limit superior in probability). 



p- lim sup Zn = inf 

n— »oo 




lim Pr {Zn > a} = o| . 
n^oo J 



Definition 1.3.2 (Limit inferior in probability). 

p- lim inf Z^ = sup | (5 \ lim Pr {Z^ < /?} = o| . 

n— >oo I n^oo J 

These are extended notions of lim sup and lim inf when Zn is a deterministic 

n — ^•oo 

real- valued sequence and therefore have properties similar to the limit supe- 
rior and the limit inferior. For example, for arbitrary sequences of real- valued 
random variables and we have 

p- lim sup(Z^ PVn) < P- lini sup + p- lim sup V^, 

n—^oo n—^oo n-^oo 

p- lim inf (Zn + Vn) > p- lim inf Z^ + p- lim inf 

n— >oo n—^oo n-^oo 

p- lim sup(Zn -\-Vn) >p- lim inf Z^ + p- lim sup Vn, 

n-^oo n— ^cxD n-^oo 



1.3 Coding for General Sources: Fixed-Length Codes 



15 



p- lim inf (Zn + Ki) < P- lini inf Zn + p- lim sup Vn, 

n^oo n^oo n^oo 

p- lim inf Zn < p- lim sup Zn , 

n >oo n — >oo 

p- limsup(— Zn) = — p- lim inf Zn- 

n— >oo n— >co 

We also have 

p- lim sup Zn = p- lim inf Zn = c p- lim Z^ = c, 

n^oo 

where p- lim Zn — c means that Zn converges in probability to a constant 

n— >oo 

C. 

We define the fixed-length coding for a general source X in the same 
way as §1.1. The definition of Rf(X.) is also defined in the same way (cf. 
Definition 1.1.1 and Definition 1.1.2). Note that, unless stated otherwise, all 
the results described in this section and the following sections hold for sources 
with countably infinite source alphabet A'. 

We call 

llog ^ 

n ^PxAX^) 

and its distribution the entropy density rate of X and the entropy- spectrum 
(or, more generally, the inf ormation- spectrum) , respectively (Han and Verdu 
[46]). In addition, we define 

and call H{X.) the spectral sup-entropy rate of X. * Here, notice that H{X.) > 
0 follows since Px^> {X'^) < 1. 

Then, we obtain the following theorem that is a generalized version of 
Theorem 1.1.1. 

Theorem 1.3.1 (Han and Verdu [46]). 

Rf{X) = H{X). 

In order to prove such a generalized theorem, the Fano inequality and the 
law of the large numbers described in §1.1 are no longer available. We need 
the following two simple, but powerful, lemmas: 

Lemma 1.3.1. Let Mn he an arbitrary given positive integer. Then, for all 
n = 1, 2, • • • there exists an (n, Mn^Sn)-code satisfying 

< Pr j - log ■ ly-' - > 1 logM„ 1 . 

[n n J 

* In Han and Verdu [46] the spectral sup-entropy rate is called the sup- entropy 
rate. However, we change its name in this book so as not to make confusion. 
Actually, in §1.7 we will define (sup-) entropy rate as the limit (superior) of the 
expectation of the entropy density rate. 



16 



1 Source Coding 



Proof. Define 

T„ - |x € ;f” I 1 log < 1 log M„ I . 

1^ n Px-(x) n } 

Then, we have (x) > for any x G T^. Since it holds that 



1 > ^ ^ \Pn 

xGT,, 



1 



we have \Tn\ < Mn- Therefore, there exists a pair of an encoder and a decoder 
{(Pni'f’n) with a code Adn of size Mn that can correctly decode all elements 
of Tn (see 1) in the proof of Theorem 1.1.1). Then, the error probability Sn 
can be evaluated as 

£„ = Pr{x”^V'n(^n(X”))} 

<Prra=P,{ilog;^^>il„gM„}, 

where the superscript “c” denotes the complement. □ 



Lemma 1.3.2. For all n = 1, 2, • • •, any (n, Mn,en)-code satisfies 

£n >Pr(llog - — > llogM„+7| 

[n Px"(X”) n J 

where j > 0 is an arbitrary constant. 

Proof. Define and Sn by 

r„ = |x e Al” I 1 log p ^ > - logM„ + 7 I , 

[ n Px"(x) n J 

and 

5„ = {x € A"" I X = ■0n(Pn(x))} . 

Then, it follows that 

Pr {T„} = Pr {Tn n + Pr {T„ n 5„} 

< Pr{5“} + Pr{T„n5„} 

= -f- Pr {Tn n Sn} • 

g-n7 

On the other hand, since we have Pxr'-(x) < ■ for x G T^, it holds that 



Pr{T„n 5 „}= ^ Px"(x)< 



x 6 T„nS„, 



xeT„n5„ 






< |-5„ 






Mn 



< e~”^, 



1.3 Coding for General Sources: Fixed-Length Codes 



17 



where I^Snl < Mn is used for obtaining the last inequality. Consequently, we 
obtain Pr {T^} < . □ 

Proof of Theorem 1.3.1. 

1) Direct part: 

Letting 7 > 0 be an arbitrary constant, we prove that 
R = H{X) + 7 

is achievable. Set Mn — Then, it clearly holds that 

lim sup — log Mn < R- 

n^oo ^ 

We note here that Lemma 1.3.1 guarantees the existence of an {n^ Mn,6n)- 
code satisfying 

However, the right-hand side of (1.3.2) converges to 0 as n ^ 00 from the 
definition of i^(X). Hence, we obtain lim Sn = 0. 

n — >00 

2) Converse part: 

Though the Fano inequality given in §1.1 is conventionally used for prov- 
ing the converse part, the Fano inequality no longer gives the lower bound to 
be developed for a general source X. It is Lemma 1.3.2 that is used instead 
of the Fano inequality in such a general situation. 

We show that the assumption that R = -f^(X) — 3y is achievable leads 
to a contradiction, where 7 > 0 is an arbitrary constant. Assume that there 

exists an (n, £n)-code satisfying lim sup — log < R and lim Sn = 0 . 

n^oo n n -^00 

Since we have — logM^ < + 7 (Vn > no) for such a code. Lemma 1.3.2 

n 

implies that 

"" - > R + 27} - e-”’ 

= '“Is 

We notice here that the definition of H (X) leads to 

for infinitely many n. However, since — > 0 as n — ^ 00 , in the left-hand 

side of (1.3.3) cannot satisfy ^ 0 as n — > 00 , which is a contradiction. □ 



18 



1 Source Coding 



-log 

n 



1 

PxniX'^) 



Fig. 1.4. 



H{X) 



Example 1.3.1. Suppose that is a stationary memo- 

ryless source subject to a probability distribution Px over a finite alphabet 
X. Then, the entropy-spectrum concentrates on a single point, the entropy 
H{X), as n oo (Fig. 1.4). As a result, i?(X) coincides with the entropy 
h\x) of X, i.e., 

Rf{X) = HOC) = H{X). 

This fact can be easily shown by using Chebyshev’s inequality given in §1.1 
(the weak law of large numbers). Here, note that the variance of log ^ 

is uniformly bounded with respect to Px since X is assumed to be a finite 
alphabet: see Remark 3.1.1 in §3.1. Hence, Theorem 1.3.1 includes Theo- 
rem 1.1.1 as a special case. □ 



Remark 1.3.1. In Example 1.3.1 above we have proved H{X.) = H{X) by 
using Chebyshev’s inequality under the assumption that X is a finite source 
alphabet. However, if we use the following theorem by Khintchin, we can 
prove H{X) — H{X) without using this assumption. 



Theorem 1.3.2 (The law of large numbers: Khintchin [9]). Let (Zi, 
^ 2 , • • •) he a sequence of an arbitrary real-valued random variables indepen- 
dently generated subject to an identical probability distribution. If E\Zi\ < 
4-00 is satisfied, then 



1 

n 



Ez* 



EEi 



almost surely as n — > oo. 



(1.3.4) 



We define Zi — log for the stationary memoryless source X = 

P X \Xi) 

(Xi,X 2 , • • •) subject to a probability distribution Px that appeared in §1.1. 
Theorem 1.3.2 implies that the entropy density rate of the source X 



1.3 Coding for General Sources: Fixed-Length Codes 



19 




1 

PxAX-) 



n 

-Elog- 



' Px(Xi) n 



n 

-Ez. 



converges in probability to EEi = H{X) (entropy). Notice here that the 
almost sure convergence implies the convergence in probability. Thus we 
have 



Rf(K) = H(K)^H{X). 



This means that, even if A' is a countably infinite source alphabet, Theo- 
rem 1.3.1 tells us that Theorem 1.1.1 still holds (in particular, Rf(X.) = -hoo 
if H{X) = -foo). This result cannot be obtained from the Fano inequality 
used for proving the converse part of Theorem 1.1.1. However, it is important 
to notice that using Khintchin’s law of large numbers instead of Chebyshev’s 
inequality tells us nothing about the speed of the almost sure convergence 



(or the convergence in 



1 

probability) of — Zi 



i=l 



as a function of n. 



□ 



Example 1.3.2. Example 1.3.1 and Remark 1.3.1 can be generalized in the 
following way. It is known that for any stationary ergodic source 






(cf. Gallager [30], Gray [31], Cover and Thomas [17]) with a countably infinite 

alphabet A — log — — converges almost surely (therefore in probabil- 
n Px-\X^) 

ity) to the entropy rate of the source X defined by 



HCX) = lim -H{X^ 
n^oo n 



as n ^ 00 (cf. Barron [7]). This implies that H(X.) = H{X) (see Re- 
mark 1.7.3 in §1.7). Hence, in view of Theorem 1.3.1 we have the formula 



Rf(X) = H(X) = lim -H(X^). 

n-^oo n 



(1.3.5) 



If we consider a stationary m-th order irreducible Markov source X = 
(Xi,X 2 ,---) (cf. Feller [29], Gallager [30]) with a finite alphabet A as a 
special case of the stationary ergodic source, (1.3.5) can be expressed as 
follows by using the conditional entropy: 



Rf{X) = H{X) = H{Xm^,\XiX2 • • -X^). 



(1.3.6) 



Example 1.3.3 (Infimum achievable fixed- length coding rate for 
nonstationary memoryless sources). If we consider a memoryless but 
nonstationary source 

x = (Xi,X2,--o 



20 



1 Source Coding 



with a finite alphabet A', we have the formula 

— 1 "" 

Rf{X) = H{X) = limsup - ^ H{Xi) 

n^oo 1 

1=1 

from Theorem 1.3.1 and Chebyshev’s inequality (cf. Csiszar and Korner [19]). 

□ 



Remark 1.3.2. So far we have assumed that the random variable X'^ of a 
source X = takes values in the n-th Cartesian product of a 

source alphabet X. However, as can be seen from the proofs of Lemma 1.3.1, 
Lemma 1.3.2 and Theorem 1.3.1, we have the same results if and X'^ are 
replaced with an arbitrary countably infinite set Zn and an arbitrary random 
variable Zn taking values in respectively. This remark is valid for all ar- 
guments not only in this chapter but also in the other chapters except for the 
case that the alphabet only in the form of the Cartesian product makes sense. 
In particular, in the chapters after Chapter 3, Zn can be an arbitrary (not 
restricted to countably infinite) set. This fact makes us notice an important 
property that arguments from the information-spectrum approach are based 
on a certain “invariancy.” See §1.12 for more details. □ 



1.4 Fixed-Length Coding for Mixed Sources 

For two given general sources Xi = and X 2 = {X 2 we call a 

general source X = defined by 

Px-(x) -- aiPx-(x) + a 2 Px-(x) (x G X^) (1-4.1) 

the mixed source of Xi and X 2 , where a± and o ;2 are constants satisfying 
ai > 0, a 2 > 0 and ai + 0^2 = 1- 

In particular, if we consider the case where Xi and X 2 are two differ- 
ent stationary memoryless sources (or stationary ergodic sources), then the 
mixed source X can be regarded as a typical and simple example of a sta- 
tionary nonergodic source. Since all of the stationary sources (typically, with 
finite source alphabet X) can be viewed as mixed sources obtained by mixing 
stationary ergodic sources with respect to probability measures (e.g., cf. Gray 
and Davisson [33], Billingsley [8]), we can clarify structures of the station- 
ary sources from deep investigations on the mixed sources. In this sense, the 
study of mixed sources is one of important subjects in information theory. 

We have the following briefly described theorem on encoding of the mixed 
source defined by (1.4.1). 

Theorem 1.4.1. For the mixed source X of two general sources Xi and X. 2 , 
Rf(X.) is given by 

Rf{X) - max(P(Xi),P(X 2 )). 



1.4 Fixed-Length Coding for Mixed Sources 



21 



In view of Theorem 1.3.1, it suffices to establish the following lemma in 
order to prove Theorem 1.4.1. 

Lemma 1.4.1. H{X) = max(;^(Xi),S^(X 2 )). 



We need the following lemma for proving this lemma. 

Lemma 1.4.2. Let and be two sequences of random 

variables taking values in and define {Zn}^=i by 

Pzr,. (^) = ^ 

Then, for an arbitrary sequence of functions {fn}^=i defined over Z^, we 
have 

p-limsup/„(Z„) = max ( p-limsup/„(Z^^^),p-limsup/„(2'^^)) 

n^oo \ n^oo n— »-co 

Proof If we consider the random variable Q satisfying Pr {Q == 1} q;i and 
Pr {Q z= 2} = q; 2 , be expressed as 

ifQ-1, 

HQ = 2. 

Then, for any A it holds that 

Fl{fn{Zn)>X} 

= aiPr {fn{Zn) > A I Q = 1} + a 2 Pr {/„(Z„) > A | Q = 2} 

= aiPr {/„(Z(i)) > a} + aaPr {/„(Z(2)) > a} . (1.4.2) 

Since 




lim Pr {fn(Zn) > A} = 0 

n^oo 

for an arbitrarily fixed A satisfying A > p- lim sup it follows that 

n—^oo 

lim Pr > a| = 0, lim Pr > a| = 0. 

n^oo t J n^oo L J 

That is, we obtain 

p-limsup/„(Z^^)) < p-limsup/„(Z„), (1.4.3) 

n— >oo n^oo 

p-limsup/„(Z^^^) < p-limsup/„(Z„). (1-4.4) 

n^oo n^oo 

On the other hand, for an arbitrarily fixed A satisfying A < p- lim sup /^(Zn), 

n-^oo 

we have 



Pr {fn{Zn) > A} > 3eo > 0. 



22 



1 Source Coding 



for infinitely many n. This means that 

limsupPr {/n(^n) > A} > £o > 0, 

n— >oo 

which yields 

limsupPr > a| > 0 or limsupPr > a| > 0. 

These two inequalities imply that 

p-limsup/„(Z^^)) > p-limsup/„(Z„) 

n—^oo n— >oo 

or 



p-limsup/„(Z^^^) > p-limsup/„(Z„). 

n^oo n— >oo 

Therefore, we obtain 

max (p-limsup/„(Z^^^),p-limsup/„(Z^^^) ] > p-limsup/„(Zn).(1.4.5) 

\ n^oo n— )^oo / n-^oo 

Summarizing (1.4.3)-(1.4.5) completes the proof of Lemma 1.4.2. □ 



Proof of Lemma 1.4- L 

We use the following simple inequality (Verdu and Han [91]): 



— logmin(o;i, q; 2 ) + max(u,u) 
n 

< - log[aie’^'^ + a 2 e^^] < max(ix,u). (1.4.6) 

n 

By setting 

u = - logPx-{X^), V = - logPxj(X”), 
n ^ n ^ 

we have 



-^ + max f 1 log Pxr. (X"), - log Px’^ (X”)) 
n \n ^ n ^ J 

< -logPxAX'^) 
n 

< max f t log Px;* (X'^), - log Px- (X")) , (1.4.7) 

\n ^ n ^ J 

where cq = — logmin(ai, a 2 ). We rewrite this inequality in the following 
form: 

d„(X") < i log < d„(X”) + (1.4.8) 

(ilogPx]-(x),ilogPxj(x)). 



2l^(x) = — max 



1.4 Fixed-Length Coding for Mixed Sources 



23 



Inequality (1.4.8) immediately implies that 

p-limsupllog— — = p-limsupZ\„(X"). (1.4.9) 

n— >oo ) n—^oo 

On the other hand, we have 
p- lim sup An 

n—^oo 

= max ( p-limsupZin(-^r')5P"l™s^P^n(-^2 ) ) (1.4.10) 

\ n—^oo n— >oo / 

from Lemma 1.4.2. Here, if we arbitrarily choose a sequence { 7 ^} satisfying 
7 i > 72 > • ■ • > O 5 7n 0 and 717 ^ ^ 00 , it holds that 

Pr 1 1 log Px;' (XD - i log Pxj (X^) < -7n } 

xGBr,, 

< Y, ^ 0 (n ^ oo), 



where 

P„ = |x 6 A'" I 1 log Pxj* (x) - 1 log Pxj (x) < - 7 „ I . 

n ^ n ^ J 

This means that 

1 log Pxj. (xn > 1 log Pxy (X") - 7n 
with probability 1 — Hence, with probability 1 — we have 

llogPx;.(Xn 

< max (^1 log Px;. (XH, ^ log Pxy (Xd) 

< llogPx;. (Xn+7n, 

n ^ 



7 n + Log „ ] < Z\„(xn < 1 log ■ ^ 



n "Px;‘(Xf) 
Consequently, we obtain 



n "Pxr(Xf)- 



p- lim sup i^i) = p- lim sup — log - 



n ^Px;'(Xr)- 



(1.4.11) 



(1.4.12) 



In the same manner we also obtain 



24 



1 Source Coding 



p- lim sup An (X^) = p- lim sup - log ] . . 

n-^oo n-^oo Ti rx^\-^2) 

By summarizing (1.4.9), (1.4.10), (1.4.12) and (1.4.13), we have 



p- lim sup — log 



1 



n "Px^-(X-) 

= max ( p- lim sup 1 log ) ■. , p- lim sup 1 log ] . , , 

V n^oo n Px” {XY) n->oo U Px^(X^) J 

which establishes 

H(X) ^ max(F(Xi),¥(X2)). 



(1.4.13) 



(1.4.14) 

(1.4.15) 



□ 



Remark 1.4.1. The proof above guarantees that 
for sufficiently large n. In addition, since 



1 



llog C ,._ 

n ^PxAX^) 



= An{X^)+o{l), 
= An{X^)+o{l), 



. . 1 , 1 
the entropy-spectrum of — log •— — , , 

n ®Px-(X”) 

of the entropy-spectra of - log — — and - log 

n Pxi"(Xf) n Px^{X^ 



asymptotically becomes the mixture 
with weights 



ai and a 2 , respectively. Such an interpretation plays a crucial role on under- 
standing quantities appearing in this book. We call this way of interpretation 
the information- spectrum approach. □ 



Remark 1.4.2. The way of establishing Lemma 1.4.1 is still valid when the 
source alphabet X is general (not necessarily countably infinite) if we inter- 
pret Px"' (O? (’) and Pxj (•) as adequate probability density functions (or, 

more formally, probability measure elements). This remark is also valid for 
Lemma 1.4.3 below. □ 



Example 1.4.1. Let Xi and X 2 be stationary memoryless sources subject 
to probability distributions Pi and P 2 , respectively. Denote by X = 
the mixed source of Xi and X 2 defined by 

n n 

Px^'-(x) == Pl(^z) + OL2 n^2(x,), 

2=1 2=1 



1.4 Fixed-Length Coding for Mixed Sources 25 

where x = (xi, • • • , G Then, owing to Remark 1.4.1, the information- 
spectrum of 

liog 4 ... - 

converges to the information-spectrum given in Fig. 1.5 as n ^ oo that has 
two peaks of probabilities ai and 0 ^ 2 , respectively. Prom Lemma 1.4.1, we 
have 

H{X) = max{H{P,),H{P 2 )). 

Then, Theorem 1.4.1 guarantees that the right-hand side of this equation 
gives R/(X) of the mixed source X (cf. Shannon [77]). □ 



Fig. 1.5. 



ai 



02 



H{Pl) H{P2) 



We can generalize Lemma 1.4.1 in the following way. Suppose that count- 
ably infinite general sources X^ = = 1^2, •••) are arbitrarily 

given. We call a source X = defined by 

00 

■Px” (x) = QiPx" (x) (Vn = 1,2, •••;Vxe A’") (1.4.16) 

i=l 

the mixed source of a source family {X^}g^, where (i = 1,2, •••) are 
constants satisfying 

oo 

ai = 1 (ai > 0 : Vi == 1, 2, • • •). 

i—1 

Clearly, this mixed source is a generalized version of the mixed source defined 
at the beginning of this section. Defining the spectral sup-entropy rates by 

g(Xi) = p-limsupllog I (i = l,2,---), (1.4.17) 

n^oo ri rxV'\^i ) 

H{X)=p- lim sup 1 log p— ) , 

n^oo n Px"(-A”) 



(1.4.18) 



26 



1 Source Coding 



we have the following lemma. 

Lemma 1.4.3. For the mixed source X defined by (1.4-16), we have 

F(X)- sup H{Xi). (1.4.19) 

2>l:ai>0 

Proof. Notice that we can assume that > 0 (i = 1, 2, • • •) without loss of 
generality. 



1) We first prove 

H{X) >supH{Xi). (1.4.20) 

i>l 

To this end, for an arbitrary positive integer k we define by 

oo 

^/c+1 ~ 

and a source Xk = {Xk}^=i by 

^ oo 

Then, the source X = can be expressed as the mixed source of 

a finite number of sources X^ {i = 1,---,A:) and X^ with the probability 
distribution 

k 

(x) = y^a»Px”(x) + a'fc+iP^-(x). 

i=l 

If we apply Lemma 1.4.1 repeatedly k times, it follows that 
ff(X) = max {P(Xi), • ■ ■ , F(Xfc), F(Xfc)} > H{Xk). 

Since k is arbitrary, this inequality leads to (1.4.20). 



2) We next prove 

^(X) < sup^(X^). 



i>l 



Let R be an arbitrary constant satisfyinj 
R > sup H{Xi) 

i>i 

and define 

Sn{R) = {xe A’”|Px-‘(x) <e-"«} 



SW(P) 



X G 



Px"(x) < 



-nR 



Oii 



(1.4.21) 



(1.4.22) 



(i = l,2,.-.). 



1.4 Fixed-Length Coding for Mixed Sources 



27 



Since (1.4.16) guarantees that x G Sn\R) (Vz = 1,2, • • •) for x G Sn{R)^ we 
have 

(Vi = l,2,...). (1.4.23) 

Then, from (1.4.16) again, it follows that 

Pr i 1 log — — ^ > r\ 

\n ^Px"(X”)- / 

= Px"(5n(i?)) 

OO 

= J^aiPxr'iSniR)) 

2=1 

OO 

< J2aiPxr.iS^\R)) 

2=1 

f 1 1 111 

= V Oi Pr - log - — — — >R--log—\. (1.4.24) 

^ [n Pxi-{X^) n aij 

However, from the definition of iJ(X^) and (1.4.22) we have 

lim Pr i 1 log — — L— > i? - 1 log — i = 0 (Vf = 1, 2, • • •). 
n^oo l^n Pxr‘(Xp) n ai j 

Consequently, by applying Fatou’s lemma (cf. Billingsley [9]) to (1.4.24), we 
obtain 

lim Pr I — log — — . > r\ — 0. 

n^oo \n Px^{X'^) ) 

This indicates that H(K) < R and therefore establishes (1.4.21). □ 

Theorem 1.3.1 and Lemma 1.4.3 immediately yield the following theorem 
and corollary on the infimum achievable fixed-length coding rate. 

Theorem 1.4.2. For the mixed source X defined by (1.4-16), we have 

Rf{X) = sup (1.4.25) 

2:o:i>0 

Corollary 1.4.1. If each source X^ is stationary and ergodic, then for the 
mixed source X defined by (1.4-16) we have 

Rf{X) = sup (1.4.26) 

i:ai>0 

where H{Xi) denotes the entropy rate (see Example 1.3.2 in ^1.3) of Xi. □ 



28 



1 Source Coding 



Example 1.4.2. In general, an m-th order stationary Markov source X = 
= (Xi, X 2 , • • • , Xn)}^=i can be expressed as the mixed source of m-th 
order stationary irreducible Markov sources 

X, = {X” - , • • • , 1 (i = 1, 2, • • • , s) 

whose probability distribution is given by 

s s 

Px" (x) = 5] a,Pxj. (x) (ai > 0, = 1; Vx e X") . 

i=l i=l 

Then, from Corollary 1.4.1 we obtain the following formula: 

RfiX) = • • • X^). (1.4.27) 



The entropy-spectrum of — log — — . „ . 

n Pxr^iX^) 
with s peaks of probabilities cei, a 2 , • • ’ ? c 



converges to the s-point spectrum 
5 as n 00 . □ 



Remark 1.4.3. It is easy to verify that, if we consider the spectral inf- 
entropy rates (see §1.5) 



P(Xi) = p-hminf log {i = l,2,---), 

n^oo n Px^\X['') 


(1.4.28) 


P(X) = p-hminf log 

n— »-oo n Px^'\X^) 


(1.4.29) 


instead of (1.4.17) and (1.4.18), we have 
HiX) = inf P(X,). 

i>l:oci >0 


(1.4.30) 



Next, let us consider the mixed source obtained from a more general way 
of mixing. Let ^ be an arbitrary set (a probability space). Suppose that a 
general source X.q = is attached to each 6 e Here, we assume 

that for all n = 1, 2, • • • and x G Px^ (x), the probability with respect to 
Xq, are measurable functions of 6, where X denotes a source alphabet. If we 
fix an arbitrary probability measure w on we obtain a source X = 
with the probability distribution 

Cx"(x)= [ Px^{x)dw{9) (Vn = 1,2 ,---;Vx€ X"). (1.4.31) 

This source is called the mixed source of a source family We define 

functions of R that characterize the entropy-spectrum instead of the spectral 
sup-entropy rate defined by 

£(fl|X) s UmintPr|ilogp^ > fi}, 



(1.4.32) 



1.4 Fixed-Length Coding for Mixed Sources 



29 



F(R|X)^lims«pPt{llogp^ 



>R 



(1.4.33) 



and attempt to express these functions in terms of w{-). Since this problem is 
hard for the case where the source X.q (6 G is general, we assume that each 
is a stationary memoryless source subject to probability distribution Pxe 
over a finite alphabet A!. We simply denote such 'Kq by X.q = {Xgi}. Then, 
We have the following lemma. 

Lemma 1.4.4. Suppose that X is a finite source alphabet and each — 
{Xe} is a stationary memoryless source. Then, for the mixed source X defined 
by (1.4-31 ), we have 



L 



{e\H{Xe)>R} 



dw{d) < F{R\X) 



< 



F{R\X) < [ 



{e\H{Xe)>R} 



dw{6) (VjR>0), 



(1.4.34) 



where H{X$) denotes the entropy and the inequalities in (1.4-34) hold with 
equality except for at most countably infinitely many R. 



Proof. 

1) First, we prove 



1. 



dw{6) < F{R\X). 



l{e\H{Xii)>R} 

Notice that, as is shown in the proof of Lemma 1.4.1, we have 



(1.4.35) 



Pr < — log 
n 



PxAxr. 



- n PxAX^) 



Fin) > 1 - e 



-n-^n 



for any 0 e where 

7n ^ O5 ^7n -^00 as n 
While from (1.4.31) we have 
1 



00. 



(1.4.36) 



(1.4.37) 



= / Pr 

J<i> 






> 

1 






> r\ 






dw{0), 



by using (1.4.36) this can be rewritten as 



30 



1 Source Coding 



Pr < — log 



a 






> R 



n PxAX^) 



> R + Jn f — ^ 



-^7n, 



dw{9) 






> R + jnf dw{9) - e 



Hence, Patou’s lemma and (1.4.37) yield 



f 1 1 

lim inf Pr < - log - — 

n^oo \^n 



> R 



> 



I 



lim inf Pr 

n— too 






log 



> Rp Jn 



dw{0). 



We notice here that 
^ 1 



lim inf Pr - 



log- 



n^oo |n Px-'{X^) 

for R > H{Xe) and 

f 1 1 

lim inf Pr < — log ■ 



n-too I n Pxj’-iX'^) 



> R Jn } — 0 



> R + f — 1 



(1.4.38) 



(1.4.39) 



for R < H{Xe) owing to the assumption that Xq is stationary and memory- 
less. Then, (1.4.39) leads to 



F{R\X) = limintPr {i log > fl} 






l{e\H{Xo)>R} 
which establishes (1.4.35). 



dw{6), 



2) Next, we prove 



F{R\X) < [ dw{6). (1.4.40) 

J{e\H{Xo)>R} 

We first define the type Tx of x = (xi, X 2 , • • • , Xn) G by 
T^{x) = (Vx e X), 

where n{x) means the occurrence number of x as a component of x. Denoting 
by Ti,T 2 , • • • the list of all possible types, we recall that Nn satisfies 
Xn < (n + 1)1*^' (cf. Csiszar and Korner [19]). Here, note that 



1.4 Fixed-Length Coding for Mixed Sources 



31 



Pxn (x) = Px" (x') for Tx = Tx- , (1.4.41) 

Px'‘(x) = Pxj‘(x') for Tx = Tx' (1.4.42) 

from the assumption that all X^i are stationary memoryless. Let R > 0 he 
an arbitrary constant and set 

Sn(R) = {x G A'”|Pxn(x) < e-”«} . 

For each x € define 

^(x) = {0 € <?|Pxy (X) < ev^Pxn(x)} . 

If we apply Markov’s inequality (see Remark 1.1.1 in §1.1) to (1.4.31), we 
have 

Pr{6> € ^(x)} > 1 -e"^ (xgA’"), (1.4.43) 

where Pr{-} in the left-hand side means the probability with respect to u;(-). 
On the other hand, (1.4.41) and (1.4.42) tell us that ^(x) depends only the 
type Tx of X. That is, ^(x) = ^(x') if Tx = Tx'. This enables us to express 
^(x) as ^{%) (T, - Tx), where 

= {x G A'"|Tx ^Tk} (^ = 1, 2, . • . , Nn). 

We notice here that under the notations 

pW(P) = {x € , 

Lr, 

Sn{R) = U Tfc, (L„ < Nn), 

N,, 

K=f] H-Tk), 

k=l 



it holds that 



5„(P)c 5^(P) (V0eC)- 
From (1.4.43) and < (n + 1)^'^', we have 


(1.4.44) 


Pr {i9 G ^;} > 1 - (n -t- l)l‘^le-^. 


(1.4.45) 


On the other hand, (1.4.31) implies that 









= [ Pxj^{SniR))dw{0). 
J <p 

Then, it follows from (1.4.44) and (1.4.45) that 



32 



1 Source Coding 



Pr < — log — — ^ r\ 

\n ^Pxr>iX^)~ f 

= [ Px;'\Sn{R))dw{e)+ f Px^{Sn{R))dw{0) 

< [ Px^(Sl,^\R))dw{e) + {n + iy^^e-^ 

< [ Pxji-{S^^\R))dwie) + {n + iy^'e-^ 

“//■■{» ^ 

By using Fatou’s lemma we obtain 

F{R\X) 



= lim sup Pr < — log — — 
' n Px 






< 



/ 

J4> 



1 T. 1 1 1 

lim sup Pr < — log 



I n‘”“ Px”(^e) ^ \/n 



dw{6). 



(1.4.47) 



Recalling that the source is assumed to be stationary and memoryless, 
we obtain 



lim sup Pr 

n^oo 

for R > H{Xg) and 



Px;; (X^) ^ ^ y/n 



0 



lim sup Pr < — log 









n— ^oo Pxjl' {Xq ^ 

for R < H[Xe). By using these equalities, (1.4.47) can be written as 



F(R|X) < [ 
j{e 



l{e\H{Xo)>R] 
which establishes (1.4.40). 



dw{6)^ 



□ 



Remark 1.4.4. As is easily checked from the proof above, (1.4.34) in 
Lemma 1.4.4 holds as well for the case that A is a finite source alphabet 
and each X^ = {X^\xf‘\-’’) {6 G ^) is an m-th order stationary irre- 
ducible Markov source^ where m is finite and common for all 0. We have only 
to use the Markov types instead of the types (cf. Davisson, Longo and Sgarro 
[21]). In such a case we need to replace H{Xq) with 

H{Xe) = ■ • • X'™)), 



1.4 Fixed-Length Coding for Mixed Sources 



33 



the entropy rate of Xgi (see Example 1.3.2 in §1.3). This remark is also valid 
for (1.4.48) in Theorem 1.4.3 and (1.6.7) in Example 1.6.2 in §1.6. □ 



Theorem 1.3.1 and Lemma 1.4.4 immediately imply the following theo- 
rem. This theorem can be viewed as a special case of the general formula 
given by Winkelbauer [96] treating the case when each X^i is stationary and 
ergodic with finite alphabet X (also cf. Csiszar [18]). 

Theorem 1.4.3. Suppose that X is a finite source alphabet and each Xgi = 
{Xq} is a stationary memoryless source. Then, for the mixed source X defined 

by (14-31), 

Rf{X) = H{X) = w-ess.sup H{Xe), (1.4.48) 

where u;-ess.sup on the right-hand side of (1.4-4^) denotes the essential supre- 
mum of H{Xq) with respect to the probability measure w. 

Proof. For an arbitrary R satisfying R < w-ess.sup H{Xq), the left-most 
inequality of (1.4.34) in Lemma 1.4.4 guarantees 

0< [ dw{0) <F{R\X). 

J{e\H{Xo)>R} 

On the other hand, for an arbitrary R satisfying R > w-ess.sup H{Xq), the 
right inequality of (1.4.34) implies 

F{R\X) < [ dw{6) = 0. 

J{e\H(Xe)>R} 

Therefore, we have H{X) = w-ess.sup H (Xq) . □ 



Remark 1.4.5. In fact, the left- most inequality in Lemma 1.4.4 holds in the 
following form if Xq is a general source with an arbitrary alphabet X, which 
can be seen from the method of proof 1). Denoting by II (Xq) the spectral 



inf-entropy rate of Xg (see §1.5), it holds that 




F{R\X) > [ dw{e) (Vi? > 0). 


(1.4.49) 




Hence, while 




iJ(X) > w-ess.sup ^(Xe), 


(1.4.50) 


H{X) > w-ess.iniH{Xe), 


(1.4.51) 



^ For a measurable function Ze of 0, the essential supremum with respect to w(0) 
is defined as w-ess. sup Ze = inf {a\ Pr {Ze > a} = 0}. 



34 



1 Source Coding 



hold for an arbitrary family of general sources ^ we cannot generally 

know whether the inequality in the opposite direction holds. However, for the 
mixed source treated in Lemma 1.4.3 it holds that 

53 oci< F{R\X) < F{R\X) < 53 ^ 

provided that each {i = satisfies the strong converse property 

(see §1.5). In addition, for the mixed source X defined in Theorem 1.4.3, 
Lemma 1.4.4 yields that the spectral inf-entropy rate ^(X) (see §1.5) of X 
can be expressed as 

H{X) = w-ess.MH{Xe). (1.4.52) 

Example 1.4.3. Let {Pxe} be a family of probability distributions over a 
finite source alphabet X parameterized by 6. For each 6 denote by X^ the 
stationary memory less source subject to a probability distribution Pxq • That 
is, X = (xi, • • • , Xn) C is generated with probability 

n 

-Pe(x) = \\PxeiXi). 

i=l 

Now, let w{6) be an arbitrary probability measure and denote by X = 
the mixed source obtained by mixing Xq with respect to a weight 
density w{6). The probability distribution of X = is given by 

Fx-(x) = j Pe{^)dw{9) (Vn = 1, 2, • • • ; Vx G Af^). 

Then, Lemma 1.4.4 claims that, in the limit of n — > oo, the information- 
spectrum of the mixed source X is distributed along the horizontal axis 
with the weight density w{9)^ where we take H[Xe) as the horizontal axis 
(Fig. 1.6). This continuous information-spectrum is regarded as an extension 
of the two-point spectrum given in Example 1.4.1. □ 



1.5 Strong Converse Theorem for Source Coding 

While the direct part and the converse part consist of one theorem on the 
fixed-length source coding, we sometimes require the strong converse property 
on the converse part. This section is devoted to a description of the strong 
converse property. 

^ For a measurable function Zq of the essential infimum with respect to a prob- 
ability measure w{0) is defined as w-ess. inf Ze = sup {/?| Pr {Ze < P} = 0}. 



1.5 Strong Converse Theorem for Source Coding 



35 




Definition 1.5.1. In the fixed-length source coding, a source X = 
is called to satisfy the strong converse property if any {n, Mn,Sn)-code satis- 
fying 

lim sup — log Mn < R 

n — >•00 n 

for an arbitrary R such that R < Rf(X.) leads to 
lim Sn = 1. 

n— >00 

We have the following theorem on the strong converse property. 

Theorem 1.5.1. A source X = satisfies the strong converse prop- 

erty if and only if 

F(X) = F(X), 

where we define by 

H(X) = p-liminfi|ogj;-^. (1.5.1) 

This quantity is called the spectral inf-entropy rate of the source X. Clearly, 
(X) > 0 since < 1. 

Proof. 

1) Sufficiency: 

Suppose that H(K) = S(X). For an arbitrary constant 7 > 0 if we 
define by ~ ^7, we have R = HCK) — 87 from Theorem 1.3.1. 

We consider an arbitrary (n, ^n)-code satisfying lim sup — log < R. 

n— )-oo n 

Then, since 

i log Mn < + 7 (Vn > no), 

n 

it follows from Lemma 1.3.2 that 



36 



1 Source Coding 



>»6 i ■« + 27| - e (1.5.2) 

We notice here that the first term on the right-hand side goes to 1 as n — > oo 
due to the assumption of H(X.) = ^(X). By noticing that e“'^^ ^ 0 as 
n oo, we have lim = 1 . 

n— »oo 

2) Necessity: 

Define by = Rf(X) — 7 for an arbitrary constant 7 > 0 . Consider an 
(n, £^)-code satisfying and the condition 

given in Lemma 1.3.1. If the source satisfies the strong converse property, we 
have 

1 = liminf^n < liminf Pr ( - log — — > r\ . 

n— foo n^oo [n Fx^\X^) J 

Therefore, it holds that 

lim Pr 1 1 log — — C— < ill = 0, 

n— ^00 [ n Px^ (X ) J 

which yields R < S(X). However, we have 
i^ = %(X )-7 = H^(X )-7 

from Theorem 1.3.1, which shows that H(K) — 7 < ^(X). Since 7 > 0 is 
arbitrary, we obtain 

S^(X) <H{X.). 

Now we have established H{X.) — :H(X) since H{X.) > iL(X.) always holds. □ 



Remark 1.5.1. Generally, the information-spectrum of a source X is dis- 
tributed between i^(X) and H_(K) in the limit of n 00 . In particular, 
if X = is stationary memoryless or stationary ergo die, we have 

H(K) = S.{X) and therefore X satisfies the strong converse property (A’ 
need not be a finite source alphabet: Barron [7]). Thus, the information- 
spectrum concentrates on a single point 77 (X) as n ^ 00 . On the other 
hand, if X is a mixed source, we generally have 77(X) ^ S(X). This fact 
follows from 

:H(X) = max(F(Xi),:H(X2)) 
as is given in Lemma 1.4.1 and 



1.5 Strong Converse Theorem for Source Coding 



37 



F(X) = min(F(Xi),ff(X 2 )), 

which can be proved in the same way as Lemma 1.4.1 and Lemma 1.4.2 by 
replacing p- lim sup and max by p- lim inf and min, respectively. Hence, the 

n — >-oo 

mixed source does not satisfy the strong converse property. □ 



Remark 1.5.2. Note that S(X) is not a quantity that makes sense only in 
the form of Theorem 1.5.1. It has an operational meaning and is indepen- 
dent of i^(X). That is, the supremum of R such that lim Sn = I holds for all 

n^oo 

(n, Mn, en)-code satisfying lim sup — log Mn < R is nothing but which 

n— ^oo ^ 

is obvious from the proof of Theorem 1.5.1. This remark is valid for all quan- 
tities in the strong converse theorems appearing in the following chapters: 

we will find IL{Y) and H{X.) in Chapter 2, sup/(X; Y) and sup /(X; Y) 
_ X x.eSr 

in Chapter 3 and L)(X||X) in Chapter 4. □ 



Example 1.5.1. It should be noted that sources with the strong converse 
property are not included in the class of stationary ergodic sources. In fact, 
the class of sources with the strong converse property is much broader than 
the class of stationary ergodic sources. That is, the sources with the strong 
converse property means the source with the information-spectrum oscillating 
with respect to the block length n and concentrating on a fixed point as n ^ 
oo. Such sources are nonstationary and nonergodic in general. For example 
consider the source X = with Y = {0, 1} and satisfying 



Px-.(x) 



(if = i) 

0 (if xi = 0) 



for X = {xi,X2^ • • • , Xn)- Though this source is nonstationary and ergodic, it 
satisfies the strong converse property since H{X.) = ^^(X) = log 2. Similarly, 
for a stationary memoryless source X = with the source alphabet 

X = {0, 1} define 



= {x G Y’^\[wn\ Vs are included in x}. 



where u; is a constant satisfying 0 < u; < 1, and consider the constant- type 
source X = {X }^i with the probability distribution 

P^«.(x) = PxA^n) 

[ 0 for X ^ An. 

Though this source is neither stationary nor ergodic, it has the strong con- 
verse property since it satisfies 

H(K) = H(K) - h{w). 



38 1 Source Coding 

1.6 £-Source Coding 



In the source coding treated so far the error probability has been required to 
satisfy 

En = Pr {X" ^ -^0 as n oo. 

In this section, however, we consider the source coding for the case that the 
error probability is required to satisfy only 

lim sup Sn < s 



for an arbitrarily fixed constant 0 < £ < 1. Since we weaken the requirement 
on the error probability, we can expect that the achievable rate becomes 
small. We begin with definitions. 

Definition 1.6.1. 

Rate R is ^-achievable ^4^ There exists an (n, M^, £n)-code satisfying 



Definition 1.6.2 (e-infimum achievable fixed-length coding rate). 

Rf{e\X.) = inf {R | R is ^-achievable} . 

We now define 



which is exactly the same function as F{R\X.) appeared in §1.4. Then, we 
have the following theorem. 

Theorem 1.6.1 (Steinberg and Verdu [85]). 



Remark 1.6.1. The right-hand side of (1.6.2) is a right continuous and 
monotone decreasing function of e. If F(R) on the right-hand side of (1.6.2) 
is replaced with 



lim sup Sn < s and lim sup — log < R- 





( 1 . 6 . 1 ) 



Rf{e\X) = inf {R \ F{R) < e} (0 < Ve < 1). 



( 1 . 6 . 2 ) 




Theorem 1.6.1 still holds. 



□ 



Remark 1.6.2. Let R/(X) be the quantity defined in Definition 1.1.2. Since 
it clearly holds that R/(X) = R/(0|X) {s = 0), Theorem 1.6.1 can be re- 
garded as a generalization of Theorem 1.3.1. □ 



1.6 e-Source Coding 



39 



Remark 1.6.3. Though it is a convention to use Sn ^ s (Vn > no), instead 
of limsup^n < in Definition 1.6.1, Theorem 1 . 6.1 no longer holds under 

n^oo 

such a definition. We only have the following inequalities: 

inf {R I F{R) <e}< Rf{e\X) < inf {R | F{R) < e} (0 < Ve < 1). (1.6.3) 

(Here, the upper bound in (1.6.3) coincides with the lower bound except for 
at most countably many 0 < £ < 1.) It is actually not ( 1 . 6 . 2 ) but (1.6.3) 
that Steinberg and Verdu proved. In addition, (1.6.3) cannot be regarded as 
a generalization of Theorem 1.3.1 since e in (1.6.3) cannot be set to 0. 

This remark is valid for all theorems of this kind that will appear in this 
book. Hereafter, however, this remark is not repeated (this remark is valid 
for Theorem 2.4.1, Theorem 2.4.2 and Theorem 2.4.3 in Chapter 2, The- 
orem 3.4.1 and Theorem 3.6.6 in Chapter 3, Theorem 4.2.1 in Chapter 4, 
Theorem 6 . 2 . 1 , Theorem 6.3.1, Theorem 6.4.1 and Theorem 6.4.2 in Chap- 
ter 6 and Theorem 7.4.1 and Theorem 7.11.1 in Chapter 7. □ 



Proof of Theorem 1.6.1. 

1 ) Converse part: 

Suppose that R is ^-achievable. Then, there exists an (n, M^, £n)-code 
satisfying 

lim sup €n < s and lim sup — log < R. 

n^oo n— »oo ^ 

Lemma 1.3.2 implies that for an arbitrary 7 > 0 we have 

^ i Io6 M„ + y e-”’. 

Since 

- log Mn < R + 7 (Vn > no), 
n 

from lim sup — log < R, it follows that 

n^oo ^ 

By taking lim sup of both sides, we obtain 

n— ^•oo 

€ > lim sup > F{R + 27 ), (1.6.4) 

n— ^•co 

which means 

R > inf {R I F{R) < e} 

for the following reason. Suppose that the inequality above does not hold, 
i.e.. 



40 



1 Source Coding 



R<inf{R\F{R)<e}. 

Then, we have 

4- 27 < inf {R I F{R) < e} 

since 7 > 0 can be made arbitrarily small, which means that F{R-\- 27) > e 
and contradicts (1.6.4). 

2) Direct part: 

Next, we prove that 

R = Rq + 7 

is ^-achievable, where == inf {R \ F{R) < e} and 7 > 0 is an arbitrarily 
small constant. If we set the size of the code, to Mn = we clearly 

have limsup — logM^ < R. Then, Lemma 1.3.1 guarantees that there exists 

n— ^•oo 

an (n, Mn, £n)-code satisfying 
By taking limsup of both sides, we have 

n— >00 

limsup^n < limsupPr j-log— — 
n—>oo n^co Px^ ) J 

= F{R) = F{RoF7). (1.6.5) 

We notice here that from the definition of Rq there exists R' satisfying R' < 
i?o+7 and F(R') < e. By noting that F{R) is a, monotone decreasing function 
of R, it holds that 

F{RQFy)<F{R')<€. 

Thus, we have established limsup^n ^ ^ from (1.6.5). □ 



Example 1.6.1. If we consider a source with the strong converse property, 
i.e., satisfying Rf(X.) — H{X.) = H,(X.) such as a stationary ergodic source, 
F{R) becomes a function illustrated in Fig. 1.7. Therefore Rf{e\'X.) satisfies 

Rf{e\X) = Rf{X.) (0 < < 1) (1.6.6) 



and becomes a constant independent of e. On the other hand, for the mixed 
source treated in Example 1.4.1 in §1.4, F{R) becomes a function illustrated 
in Fig. 1.8 if H{Pi) < H{P 2 ). For this source Rf{£\'X.) can be expressed as 



Rf{e\X) 



H{P 2 ) for 0 < £ < a 2 , 
H{Pi) for 0^2 < < 1 



1.6 e-Source Coding 



41 



and is dependent on e. 

Note that X = does not always satisfy the strong converse 

property if it satisfies (1.6.6). Consider the source X as an example that is 
uniformly distributed subject to and Px^(x) = for 

odd and even n, respectively, and suppose that 0 < Pi < P 2 . Though (1.6.6) 
holds for this source, this source does not satisfy the strong converse property 
because P(X) = Pi and P(X) = P 2 . 



P(P) 

1 — 



Fig. 1.7. 



P(X) 



P 



P(P) 



1 



P(Pl) P(P2) 



'P 



Fig. 1.8. 



Example 1.6.2. Consider the mixed source X given in Example 1.4.3. 
Lemma 1.4.4 in §1.4 claims that P(P) satisfies the following inequality: 



[ dw{6) < P(P) < [ dw{0). 

J{e\H{Xo)>R} J{e\H{X0)>R} 

In general, P(P) is a monotone decreasing function illustrated in Fig. 1.9. 
Therefore, Rf{e\X) is given by 



42 



1 Source Coding 



i^/(£lX) - inf 



/ 

I 



dw{6) < e 



l{e\H{Xe)>R} 

and becomes a monotone decreasing function shown in Fig. 1.10. 



(1.6.7) 

□ 



F{R) 




Fig. 1.9. 






H{X) 




Fig. 1.10. 



Example 1.6.3. Let us consider the following nonstationary and nonergodic 
source X = with the source alphabet X = {0, 1}. We first divide all 

time points 1, 2, • • • into blocks of length 2, 2, 4, 8, 16 • • • from the beginning. 
That is, the length of the k-th block is equal to 2 for /c — 1 and for k >2. 



1.7 Coding for General Sources: Variable-Length Codes 



43 



Thus, the total length from the first block to the A:-th block is equal to 2^. We 
assume that all blocks are independently generated. We also assume that with 

probability - each block is either equal to x = 00 • • ■ 0 (all the components 

are 0) or coincident with the stationary memoryless process taking either 1 
or 0 with the same probability. Denote by 

^ (X2") 



the entropy density rate of X when n = 2^. Then, we have the following 
recursive relationship on Wk {k = 






Wk Lk Ak 

2 



( 1 . 6 . 8 ) 



where 0 < < 2 ^ log 2. Here, for each k = 1, 2, • • • the random variable 

Lk -f Ak is independent of Wk and satisfies Pr{Lfc = 0} ^ 

Fr{Lk — log 2} — ^ The sum of deviations Ai {i = 1,2,- ••,/c) is 

upper-bounded as follows: 



k 

<A:2-('=+i)log2 (A: = 1, 2, • • •)• 

i=l 



Since this upper bound converges to 0 as A; ^ oo, putting = 0 does not 
affect asymptotic properties of Wk • The distribution of Wk converges to the 
uniform distribution over the interval [0,log2] as A: — > oo (Hill and Blanco 

[49]). This fact implies that F{R) = Fcom Theorem 1.6.1 we have 



the following formula: 



Rf{e\X) = {l-s) log 2 (0 < Ve < 1). 



Remark 1.6.4. In the fixed- length coding problems for a general source 
X == described so far, the notion of the “infimum achievable fixed- 

length coding rate Rj(X)” plays a fundamental role. Nevertheless, it is also 
possible to treat the fixed-length coding problem for a general source without 
using the notion of the coding rate. One of the ways in this direction is a gen- 
eralization of the notion of the asymptotic equipartition property of the source 
without taking the information-spectrum approach. See Verdii and Han [92] 
for the generalization of Theorem 1.1.1 in this direction. □ 



1.7 Coding for General Sources: Variable-Length Codes 

The variable-length coding for a given general source X = can be 

defined in the same way as was defined for a stationary memoryless source 



44 



1 Source Coding 



in §1.2. That is, we define that a rate R is achievable in the same way as 
Definition 1.2.1 and by Definition 1.2.2. Then, we have the following 

theorem, which is a generalized version of Theorem 1.2.1. We define i^(X) 

by 

iJ(X) = limsup — (1-7-1) 

n— ^oo 

and call it the sup- entropy rate of X. 

Theorem 1.7.1. On the variable-length coding of a general source X 
using a code alphabet U satisfying K = \U\, we have 

R,{X) = Hk{X) 

(notice that the base of logarithms is K). 

Proof This theorem is proved in the same way as Theorem 1.2.1 in §1.2. It 
is easy to verify that the argument in the proof can be extended to the case 
where A’ is a countably infinite source alphabet. We just use Lemma 1.2.1, 
Lemma 1.2.2 and Lemma 1.2.3 with M — +oo. □ 



Remark 1.7.1. Theorem 1.7.1 is first proved by Kieffer [57] for a nonstation- 
ary source X with a finite alphabet X satisfying the consistency condition. 

□ 



Remark 1.7.2. Though both R/(X) and i7^(X) are coincident with the 
entropy of the source in the encoding of stationary memoryless sources (The- 
orem 1.1.1 and Theorem 1.2.1), Rv(X) is not always coincident with R/(X) in 
the encoding of general sources. Consider the mixed source in Example 1.4.1 
in §1.4 as an example. While H(X.) is equal to max(iJ(Pi), iJ(P 2))5 we have 
P(X) = aiH{Pi) + a 2 H{P 2 ). This means that P(X) ^ in general. □ 



Remark 1.7.3. If X — is a stationary source, the right-hand side 

of (1.7.1) has a limit. Then, the sup-entropy rate can be expressed as 

F(X) = lim (1.7.2) 

n^oo n 

This quantity is simply called the entropy rate of X (see Example 1.3.2 in 
§1.3). The computation of the entropy rate is not easy in general. However, 
for a mixed source X = given in Theorem 1.4.3 (this source is a sta- 

tionary source). Lemma 1.4.4 implies that the entropy rate can be computed 
as 

iJ(X) = j H{Xe)dw{e). (1.7.3) 

This is a part of the general formula by Gray [31] when consists of the 
standard space for a source alphabet X. □ 



1.7 Coding for General Sources: Variable-Length Codes 



45 



The following theorem is useful when we consider relationships between 
the fixed-length coding rate and the variable-length coding rate. 

Theorem 1.7.2. Suppose that X is a finite source alphabet Then, it holds 
that 

H(X) < liminf 

n^oo n 



< lim sup -H(X^) < H{X) < log lA”!, 

n— >00 


(1.7.4) 


where the first inequality in (1.7.4) holds for any countably infinite (not re- 
stricted to finite) alphabet X. 


Proof. We have only to prove 




H(X) < liminf 

n-H-oo n 


(1.7.5) 


and 




lim sup < H{X) < log |A’|. 

n—^oo 


(1.7.6) 



First, we prove (1.7.5). In this part of the proof the finiteness of the source 
alphabet X is unnecessary. Define 

and let 7 > 0 be an arbitrarily small constant. Denoting by 1[ ■ ] the indicator 
function, it follows that 

-H{X^) = E(Zn) 
n 

= E{Znl[0 <Zn< H{X) - 7]) + E(Z„1[Z„ > H{X) - 7 ]) 

> E{Znl[Z„ > H{X) - 7 ]) 

> (F(X)-7 )Pr{Z„>F(X)- 7 }. 

We notice here that the definition of H_{X) implies that 
lim Pr {Zn > K{X) - 7} = 1, 

n^oo 

which yields 

lim inf > H(X) - 7. 

n-^oo 77, 

Since 7 > 0 is arbitrary, we obtain 
liminf > H{X). 

n—^oo Ji 

Next, we prove (1.7.6). In this part of the proof we need the finiteness of 
the source alphabet X. Define 



46 



1 Source Coding 



^ 1 1 1 

( vn\ 

n Px-\X^) 

again and let 7 > 0 be an arbitrarily small constant. Then, we have 
n 

= E(Zn) 

= E(Znl[0 < Zn < log \X\ + 7]) + E{Znl[Zn > log |A'| + 7]). 

Define An and Bn as the first and the second term on the right-hand side, 
i.e.. 

An = E(Znl[0 < Zn < log \A\ 4 - 7]), 

Bn = E{Znl[Zn > log jAfl ~h 7])- 
First, we divide An into two terms as follows: 

An = E{Znl [0 <Jn < ^(X) + 7]) 

+ E{Znl[H{X) H- 7 < Zn < log I A| + 7])- 



Then, An is upper-bounded as 

An < (H{X) P 7)Pr {0 < Zn < H{X) -f 7 } 

+ (log lA'I + 7)Pr {H{X) + 7 < < log lA'I + 7} . 

Since the definition of H (X) implies that 
lim Pr {0 < Zn < H{X.) -h 7} = 1 , 
lim Pr {F(X) + 7 < < log l^l^l + 7} = 0 , 

n— >00 

we obtain 

lim sup An < H{X) + r 

n^oo 



Next, let us prove 
lim sup Bn = 0 . 



We express Bn in the following form: 
Bn = E 






- --Px"(nG) logPx"(-C>G), 



where G = log \X\ + 7 and 

Dg = (x G A"” I - log ^ 

[ n Px"(x) 



>-°] 



( 1 . 7 . 7 ) 



1.7 Coding for General Sources: Variable-Length Codes 



47 



Then, Bn can be rewritten as 



Bn = 



-Px^^{Dg) 

n 



E 



-Px"(x) Px^^jPo) 
PxADg) ® Px"(x) 



- -PxADg) log PxABa)- 
n 

By recalling the fact that the uniform distribution maximizes the entropy 
(Cover and Thomas [17]), it follows that 



Bn < -Px^>{DG)^og\DG\ Pxr^iDG) log Pxr>^{DG) 

n n 

< PxADg) log\X\ - ^PxADg) log PxADg)- (1-7.8) 



Using Px^' (x) < e for x G Dg enables us to evaluate in the 

following manner: 



Pxr^(DG) = E W 

xEA’^' 

= e-(^-iogl^l). (1.7.9) 

By noticing G — log |T| — 7, we have 

Px^> {Dg) E ^0 as n ^ 00. (1.7.10) 

Consequently, we obtain P(X) < log |T| -h 7. Since 7 > 0 is arbitrary, we 
have H(X) < log |T|. From (1.7.8) and (1.7.10) we further obtain 



limsup Pn == 0- (1.7.11) 

n— >00 

The combination of (1.7.7) and (1.7.11) leads to 
limsup —H{X^) < limsup^ln + limsupP^ 

n-^oo P n— >00 n—^00 

< H{X) + 7. 



Since 7 > 0 is arbitrary, we have finally established that 
1 



H{X) = limsup -F(X") < H{X). 

n— >00 P 



(1.7.12) 

□ 



Remark 1.7.4. In Theorem 1.7.2 the finiteness of a source alphabet X is as- 
sumed as a sufficient condition of (1.7.12). However, we can establish (1.7.12) 
under a weaker condition. That is, if the source X = satisfies uni- 

form integrahility^ then (1.7.12) in Theorem 1.7.2 still holds even for the case 



48 



1 Source Coding 



that A' is a countably infinite alphabet. Here, we say that X satisfies uniform 
integrability if 

is uniformly integrable in the sense defined in §5.3 in Chapter 5. Equation 
(1.7.12) immediately follows from Lemma 5.3.2 in §5.3. 

It is obvious that a source X with a finite alphabet X always satisfies 
uniform integrability. To see this, let u be an arbitrary real number satisfying 
u > log \ X\ and define 



A. 



xG A" 






> u 



Then, we have 

< PxADu)log\X\ - ^PxADu) log Px^Du), 



Px-{Du) < 

< g-(u-iogiA'i) (Vn = l,2,---) 

similarly to the development of (1.7.8) and (1.7.9) in the proof of Theo- 
rem 1.7.2. □ 



Theorem 1.5.1, Theorem 1.7.2 and Remark 1.7.4 immediately imply the 
following corollary. 

Corollary 1.7.1. Suppose that a source X satisfies uniform integrability (In 
particular, this condition is always satisfied if X is a finite source alpha- 
bet) If X satisfies the strong converse property, then the right-hand side of 

i7(X) = limsup — iJ(X^) has a limit and we have 

n—^oo 

H(X) = H(X) = HCK) = lim (1.7.14) 

n-^oo n 

That is, both Rf(X) and RyCK) coincide with i^(X) = lim (the 

n^oo n 

entropy rate^ under the uniform integrability ofK. □ 



Remark 1.7.5. In fact, if a source X = is stationary and ergodic 

with either a finite or a countably infinite alphabet A, it satisfies the strong 



1.7 Coding for General Sources: Variable-Length Codes 



49 



converse property, and therefore (1.7.14) always holds (see Example 1.3.2: 
Barron [7]). In addition, 

ilog 1 

converges almost surely to lim Hence, we have -R/(X) = RvOQ — 

n^oo Ti 

H(K) in this case. □ 



Remark 1.7.6. While Remark 1.7.2 tells us that Ry(X.) = H{X.) in the 
variable-length coding does not coincide with R/(X) = ^(X) in the fixed- 
length coding in general. Corollary 1.7.1 and Remark 1.7.5 give a sufficient 
condition that R^(X) coincides with R/(X). On the other hand, even if this 
sufficient condition is not satisfied, we can say that, under the assumption of 
the uniform integr ability of the source X, the variable-length coding is better 
than the fixed-length coding from the viewpoint that smaller coding rate is 
desirable. This is because Theorem 1.7.2 and Remark 1.7.4 always guarantee 

R^(X) - iJ(X) < H(X) = R/(X). 



In addition, the variable-length coding is better than the fixed-length coding 
from the viewpoint of the error probability as well. That is, while the former 
satisfies Sn = 0 (Vn = 1,2, ■ • •), the latter only satisfies ^ 0. However, 
such advantages of the variable-length coding result from the assumption of 
the uniform integrability of X. The advantages completely disappear if we 
consider general sources with a countably infinite alphabet X. It is more 
likely for such general sources that H(X) > H(K) holds instead of the third 
inequality in (1.7.4) of Theorem 1.7.2. We can give an example of such general 
sources. 

Since X is assumed to be a countably infinite alphabet, so is X'^. First, 
for each n — 1,2, ••• define arbitrarily two disjoint subsets Sn and Tn of 
X'^ satisfying \Sn\ = 2^ and \Tn\ = 2^ , respectively. Notice that we cannot 
choose Tn satisfying jT^I = 2^ if A' is a finite alphabet. We define the 
probability distribution Pxr> over X'^ by 

Px"(x) = 



ii X e Sn and 






if X G Tn, where 5n = — • We define (x) = 0 for x ^ iSn U Tn (Fig. 1.11). 

Denoting the source by X = the information-spectrum of X'^ is 

the two-point spectrum with two peaks at 



50 



1 Source Coding 



1 2 ^ 1 / 1 \ 

-log- =log2 log(l ) 

n I — On n \ nJ 

and 

1 2 '^^ 1 

- log log 2 + - log n 

n On n 

of probabilities 1 — Sn and 5n, respectively (Fig. 1.12). Since the second peak 
of probability 5n disappears as n ^ oo, we have H{X.) = log 2. In addition, 
we can know that the source satisfies the strong converse property since it is 
easy to verify that S(X) = log 2. On the other hand, the entropy of can 
be written as 

2n on^ 

F(X”) = (1 - 5n) log — ^ log — 

1 On On 

= - 1) log 2 + n^log2 + ^(~)i 

which satisfies 

H{X) = limsupliJ(X") = -Hoo. 

n—^oo 

Therefore, it trivially holds that H(X.) < H(K) (this means that the source 
X does not satisfy the uniform integrability) . 



distribution of 

1 — 6n 



I . 

Tji 



Fig. 1.11. 



This example can be interpreted as follows from the viewpoint of coding. 
Since H{X) — log 2, we can make the error probability Sn tend to 0 by the 
fixed-length coding provided that we have the coding rate at least log 2. On 
the other hand, for the case that the error probability is required to strictly 
satisfy Sn — ^ (Vn = 1, 2, • ■ •), the coding rate tends to -f oo even if we use the 
variable-length coding. Readers may feel this example strange if we consider 
that the variable- length coding includes the fixed-length coding. However, 
the readers can regard this example as an example that a finite rate is not 
meaningful if we strictly require = 0 (Vn = 1, 2, • ■ •) though it enables a 
meaningful coding satisfying £n 0. □ 



1.8 Coding for General Source: Weak Variable- Length Codes 



51 



1 

information spectrum of 



Fig. 1.12. 

1.8 Coding for General Source: Weak Variable- Length 
Codes 

So far the error probability Sn of the variable-length coding using a prefix 
code {(pnj'^n) has been required to satisfy Sn = 0 for all n = 1,2, • • •. We 
can also consider the variable-length coding under a weaker requirement of 
lim 6n — 0. Here, denoting a source alphabet by A', we require that 

n^oo 

the set of all codewords, is a prefix code. However, we do not require that (pn 
is one-to-one (this means that (pnM = ^n(x') may happen for some x ^ x'). 
We call such a variable- length code ((pn, '^n) the weak variable-length code or 
the weak prefix code. 

In the formulation of the coding problem of the weak variable-length code, 
we have only to use: 

Definition 1.8.1. 



Rate R is achievable There exists a weak variable-length code 

(^n,'0n) satisfying lim Sn = 0 

n— »oo 

and lim sup —E\(pn{X'^)\ < R. 

n^oo n. 

Definition 1.8.2 (Infimum achievable weak variable- length coding 
rate). 

R*(X) = inf {R\ R is achievable} . 

instead of Definition 1.2.1 and Definition 1.2.2 in §1.2. 

If a source X satisfies the uniform integrability, we have the theorem in 
which Ry(X) in Theorem 1.7.1 is replaced with R*(X). That is, we have the 
following theorem. 

Theorem 1.8.1. If a source X = satisfies the uniform integrahil- 

ity, then 

Rl{X) ^ Ry{X) = Hk{X). 



( 1 . 8 . 1 ) 



52 



1 Source Coding 



Remark 1.8.1. This theorem means that, if a source X satisfies the uniform 
integrability, we have the same infimum achievable rate as the case of = 0 
(Vn == even if we consider the variable- length prefix code in the 

weak sense satisfying lim Sn — 0. However, notice that the weak variable- 

n— >oo 

length code has an effect that it yields fewer extremely long codewords. Let 
us consider the source X = with the source alphabet X — {0, 1} 

whose probability distribution is defined by 







Px^ (x) = 0 otherwise. 



where 1'^ ^ and ^ denote the concatenations of n — 1 I’s and O’s, respec- 
tively. The Huffman code [52] { 0 , 1 }* for this source X is given 

by 

= 1 , 

= 01 , 

<^„(0u) = OOu (Vu € 



which minimizes the average codeword length under £n = 0. If we define 
as 

= 1 , 

^„(10"-i) = 01, 

^n(M = 1 (Vu G 

this is the weak variable- length code satisfying ^ 0 as n — > oo. 

While the average codeword length and the maximal codeword length of this 
Huffman code are 



and n + 1, respectively, those of the weak variable- length code are 






3 ]_ 

2 2^/n 



and 2, respectively. Here, it is clear that RJ](X) = R^;(X) = 0. □ 



1.8 Coding for General Source: Weak Variable- Length Codes 53 



Proof of Theorem 1.8.1. 

Since RlfX.) < RvOQ = ^x(X) is clear, we have only to develop 
RyfX.) > Hk{^) for proving -R*(X) = Let an encoder ipri and a 

decoder -0^ of a weak variable-length code with the error probability Sn sat- 
isfying lim 0 be arbitrarily given. By defining 



T„ = {X I X = , 

we have a prefix code (pn{Tn). Since ^ 0 as n — ^ oo, it holds that 



can be regarded as a prefix code for the source X with the error 
probability equal to 0. Then, it follows that 



from (1.8.2) and Theorem 1.7.1 using X instead of X. On the other hand, it 
follows that 



Sn = Px^‘ (2^n) ^0 as n ^ CXD. 

Thus, if we define another source X = by 



( 1 . 8 . 2 ) 





= 1 ^ Px'‘(x)|</5„(x)| + ^ T -Px-(x)|¥>„(x)| 







By taking lim sup of both sides, we obtain 



limsuplE|<^„(X")l >Px(X) 



(1.8.3) 



-Hk{X^) 

n 







\ T Px" (x)log;^Px"(x) - ^ XI -Px"(x)log;f Px"(x) 




^^xTEAhi,{x") - lp;,„(T„)log^ Px"(T„) 

n n 



54 



1 Source Coding 



Recall here the assumption that the source X = satisfies the uni- 

form integrability. By using (1.8.2) and Lemma 5.3.1 in §5.3 with An = T^, 
the third term on the right-hand side of (1.8.4) is evaluated as 

Y2 -Px"(x) (Log/c p \ v ) 0 asn^oo. (1.8.5) 

In addition, the second term on the right-hand side of (1.8.4) satisfies 

lpx"(T„) log^ ^0 as n ^ oo. (1.8.6) 

n 

due to (1.8.2). Summarizing (1.8.2) and (1.8.4)-(1.8.6), we have 
HkO^) = limsup -Hk(X^) = limsup -Hk(T") = Hk{X). 

n-^oo n— >oo 

Then, (1.8.3) implies that 

limsuplE|<^„(X")| >F>c(X). 

n^oo 

We can conclude that RJJ(X) > HkOQ since {ipn, 'ipn) is arbitrary as a weak 
variable-length code for X = ^ 

Remark 1.8.2. Note that there are sources satisfying 
Rl{X) < R,{X) ^ Hk{X) 

if T is a countably infinite source alphabet. For example, consider the 
source X given in Remark 1.7.6. If each element in Sn is mapped to a dis- 
tinct binary sequence of length n and all elements in are mapped to 
0^ = 00 • • • 0 of length n, this code is trivially a weak variable- length code 

with the error probability Sn — — satisfying lim = 0. In addition, though 

n n^oo 

Rv{X) H(X.) — -foo, we have RJJ(X) < 1 from the fact that the average 
codeword length of this weak variable-length code is equal to n. In fact, we 
can show that RJJ(X) = 1 (see Example 1.8.1 below). □ 



Now, we try to develop a generalized version of Theorem 1.8.1 giving a 
general formula of Rl(X) for a source X with a countably infinite alphabet 
X not necessarily satisfying the uniform integrability. To this end, we need 
to introduce a new quantity different from the sup-entropy rate H(X.). We 
first describe the new quantity. 

Let Z be a random variable that takes values in a countably infinite 
alphabet Z, For an arbitrary subset A C Z we define the conditional entropy 
H{Z\A) under the condition Z e Ahj 

H{Z\A) = -Y,Pz\a{z) log P zia{z), 

zez 



1.8 Coding for General Source: Weak Variable-Length Codes 



55 



where Pz\a{^) is defined by 



Pz\a{z) = 



Pz(z) 
Pr{Z e A} 
0 



for z e A, 
otherwise. 



For an arbitrary constant 0 < £ < 1 we further define by 

Hm{Z)= inf H{Z\A), (1.8.7) 

where is called the e-entropy of Z. It is obvious that 

H^AZ)<H{Z) (0<V£<1), (1.8.8) 

F[o](Z)=i7(Z) (£ = 0). (1.8.9) 

Thus, we can regard if[£](Z) as an extension of the ordinary entropy. In 
addition, (Z) is monotone decreasing as a function of e. 

Now, for an arbitrary general source X = define the weak sup- 

entropy rate of X by 

H*{X) = limlimsup li7[e](X”). (1.8.10) 

n — ^oo n 



Remark 1.8.3. If we define the e-entropy by 

(- (I 8.11) 

instead of (1.8.7), the value of i7*(X) defined in (1.8.10) remains the same 
(Yamamoto [104]). □ 



It clearly follows from (1.8.8) with setting Z = X'^ that 

F*(X) < LT(X), (1.8.12) 

where H (X) denotes the sup-entropy rate of X. 

Under these definitions we have the following general theorem on the 
infimum achievable weak variable- length coding rate R*(X). 

Theorem 1.8.2 (Han [41]). 

r:(x) = ii^(x), 

where K on the right-hand side denotes the base of logarithms. 



(1.8.13) 



56 



1 Source Coding 



Proof. In the following proof we omit the base of logarithms K for simplicity. 
1) Direct part: 

Fix 0 < /X < 1 arbitrarily. Then, (1.8.7) implies that for an arbitrary small 
constant 7 > 0 there exists a subset An C (n = 1, 2, • • •) satisfying 

Fr{X^ e An} >l-fi (n = 1, 2, • • .), (1.8.14) 

H{X^\An) < 77[^i(X") +7 (n = 1, 2, . ■ •). (1.8.15) 

Similarly to the proof 1) of Theorem 1.2.1 in §1.2, it turns out that there 
exists a prefix code : An with no decoding error satisfying 

E[|^^.„(X")||X" G yl„] < + 2, (1.8.16) 

where the left-hand side denotes the conditional expectation under the con- 
dition of X'^ G An (more precisely, we need to use an extended version of the 
proof 1) that is valid for a source with a countably infinite alphabet). Then, 
we have 



+ 2 + 7 



from (1.8.15). Now, we can define a weak prefix code (p^^n • 







0 



for X G Am 
otherwise. 



(1.8.17) 

by 



We define a decoder 'ip^^n • sis the inverse map of p^^n\An- Then, 

owing to (1.8.14), the error probability 






of this encoder satisfies 



SnifJ.) = Pr {X^ ^ An} < n (n = l,2,---). (1.8.18) 

Since we have 

E\^^,n{X^)\ = Pr{X" € (E[|^^,„(X”)||X« € yl„] + 1) 

+ Px{X^^An}, 

(1.8.17) leads to 

E|<^^,„(X”)| <iJ[^](X”) + 4 + 7. 

Accordingly, 

Thus we obtain 

limsup -E|(/J^,„(X”)| < limsup -Pf[^](X”). (1.8.19) 

n—*oo n— >oo 

Hereafter, we use an argument called the diagonal line argument Fix a 
sequence {/x^} satisfying 1 > /xi > /X2 > • • • ^ 0 and consider the weak prefix 



1.8 Coding for General Source: Weak Variable-Length Codes 



57 



code constructed in the same way when (i = fii (i = 1,2, • • •). 

Then, we have 

limsuplE|<;0^,,„(X”)| 

n— >oo ^ 

< limsup — = hi (Vi — 1, 2, • • •) (1.8.20) 

n^oo 

from (1.8.19). Equation (1.8.18) guarantees that the error probability of this 
code ((/p^i,n,'0/ii,n) satisfies 

£n{hi) < hi (Vi = 1,2, • • • ;Vn = 1,2, • • •). (1.8.21) 

Here we notice from (1.8.20) that for an arbitrary 5 > 0 there exists a se- 
quence of positive integers {n^} satisfying 

^E\(p^.^n{X'^)\ <hi + S (Vi = 1,2, • • • ; Vn > rii) (1.8.22) 

and ni < n 2 < • • • — ^ +oo. If for n = 1, 2, • • • we denote by i^ the integer i 
satisfying rii < n < n-i+i and define a weak prefix code {(pm '(pn) by 

(1.8.21) implies that the error probability Sn = Pr{X^ ^ sat- 

isfies 

£n — £n{hiri.) — hiri ('^ — I5 2, • • •). 

By noting that ^ hi 2 > • • • ^ 0 as n ^ oo, we have 

lim Sn = 0. (1.8.23) 

n— >co 

On the other hand, since (1.8.22) leads to 
< hi^^ + ^ 

= limsup lif[^.j(X'=) + 5, 

k^oo ^ 

it follows that 

limsup —E|(/?n(^^) I < limsuplimsup ^iJ[^.^](X^) + (5 

n^oo n— >oo k—^oo ^ ^ 

= limlimsuplFr„,(X^) + J 

MiO k^oo h 

= i7*(X) + 5, 

where we use the fact that limsup ^iJ[^](X^) is monotone decreasing with 

k^oo ^ 

respect to p for obtaining the first equality. Since ^ > 0 is arbitrary, we have 



58 



1 Source Coding 



limsup-E|(/?„(X")| < iJ*(X) (1.8.24) 

n^oo 

by letting ^ > 0. Now, we can conclude that ii/’*(X) is achievable as a rate 

of weak variable- length code from the combination of (1.8.23) and (1.8.24). 

2) Converse part: 

Suppose that R is achievable as the rate of a weak variable-length code. That 
is, suppose that there exists a weak variable-length code satisfying 

lim £n — 0 and limsup — E|(/?n(-^^)| < -R- (1.8.25) 

n >oo — >^oo n 

First, define 

= {x 6 X^\x. = V'nCvJnCx))} . 

Then, we have 

5n-Pr{X^^^n}. (1.8.26) 

In addition, since a code ipn '• obtained by restricting the domain 

of (fn to An is a prefix code with no decoding error for any sources taking 
values in it holds from the proof 2) of Theorem 1.2.1 in §1.2 that 

F(X"|^„) < E[|<^„(X”)||X" G (1.8.27) 

(more precisely, we need to use an extended version of the proof 2) that is 
valid for a source with a infinite alphabet). Then, since (1.8.26) implies that 

- inf ^ H{X^\Tn) < H{X^\An). 

the combination of this equality and (1.8.27) yields 

i7[,^,](X-) < E[|(^n(X")l|X- G A,]. (1.8.28) 

On the other hand, it clearly holds that 

n^n{X^)\ > Pr{X- G A,}E[|(p,(X-)||X- G AJ, 
which yields 

E|(^,(X-)| > Pr{X- G An)H^e..]{X^) 

^{l-Sn)H^e.,]{X^)- (1.8.29) 

Now, let £ be an arbitrary small constant satisfying 0 < £: < 1. Then, (1.8.25) 
implies that 

^ ^ ('^n, > no). 

By noticing that iJ[£^](X’^) is monotone decreasing with respect to s, (1.8.29) 
can be written as 

E\ipn{xn\ > (1 - ^)^w(^") (Vn > no). 



1.8 Coding for General Source: Weak Variable-Length Codes 



59 



Consequently, it follows that 

limsuplE|v:.„(X”)| > (1 - e) limsup 

n-^oo n — >•00 

Since 0 < ^ < 1 is arbitrary, we obtain 

limsuplE|<^„(X”)| > limlimsuplH(ei(X”) 

rt — ^oo ^ fi — ^oo ^ 

= H*{X) 

by letting e | 0. By noting (1.8.25) again, R > iJ*(X) follows. □ 



Remark 1.8.4. Since the week variable-length coding includes the fixed- 
length coding described in §1.3 as a special case, R^(X) < R/(X) must hold. 
Therefore, it always hold from Theorem 1.3.1 and Theorem 1.8.2 that 

H%X)<H{X). (1.8.30) 

In addition. Theorem 1.8.1 and Theorem 1.8.2 imply that H*{X) = H{X) 
provided that a source X satisfies the uniform integr ability. □ 



Example 1.8.1. Let us compute the weak sup-entropy rate if*(X) of the 
source X = given in Remark 1.7.6. First, fix a constant 0 < <s < 1 

arbitrarily. Since 5n = —<, for all n > no we can choose a subset An C Sn 
n 

satisfying Pr{X^ e An} > I — e. Therefore, 

H{X^\An)>H^e]{X^)- 

By noticing that X'^ is uniformly distributed over An under the condition of 
X'^ e An, we obtain 

H{X^\An) = log |7l,| < log |5n| - nlog2. 

This means that i7[^](X^) < nlog2, which leads to 

H*{X) = limlimsupli?[^](X”) < log2. (1.8.31) 

^10 n — >oo n 

On the other hand, consider an arbitrary subset An C satisfying Pr{X’^ G 
An] >1— e(0<e<l) and express Pr{X^ G An] as 

Pr {X- G An} = Pr {X- G n Sn} + Pr {X" G A, n T,} . 

Define Sn = Pr{X"^ ^ An} and = Pr{X^ G H T^}. Since X^ is 
uniformly distributed under the conditions of X^ G or X^ G 

the chain rule of the entropy yields 



60 



1 Source Coding 



i bfi -L 



1 



+ 



C^ri 



='■<- 



1 — £r 

CXr}. 



F(X”|^„nT„) 

1 E^i OLfi 



/ + . 
^r). 1 



log lAnCiSn 






(Xri 



■ log I An n Tn 1 5 



where h{-) denotes the binary entropy. By noticing 



1 £yi (Xn — 



\An n Sn 



CXn 



\AnHTn 



2n ’ - 2 

(1.8.33) is evaluated in the following way: 

H{X^\Ar,) 

Otji V 1 £\ 






n Oin 



1 £r] 



nlog2 + log(l - £n - OCn)] 



+ 



OLnn 



1 — 

(n^ — n)a. 



vr log 2 + log an 



n + 



log2 + log(l -£n) 



(1.8.32) 



(1.8.33) 



1 

> nlog2 -f log(l - £n) 

> nlog2 + log(l - s), 

where ^ ^ is used for obtaining the last inequality. Since An C is 
arbitrary as far as it satisfies Pr {X^ G ^4^} > 1 — it follows that 

inf H{X^\An) 

> nlog2 + log(l — e) 
which leads to 

iJ*(X) == limlimsup — iJ[e](X^) > log 2. 

n— ^•oo 



(1.8.34) 



We finally obtain 
H%X) = \og2 

from the combination of (1.8.31) and (1.8.34). Theorem 1.8.2 claims that the 
infimum achievable weak variable-length coding rate for this X is given by 



K(X) = i7*(X)^log22 = l 
(see Remark 1.8.2). In addition, we have 
i7*(X) < H{X). 

since Remark 1.7.6 tells us that H(X) = Too for this X. □ 



1.8 Coding for General Source: Weak Variable-Length Codes 



61 



So far we have defined four quantities: the spectral sup-entropy rate i^(X), 
the spectral inf-entropy rate S(X), the sup-entropy rate H(X.) and the weak 
sup-entropy rate iJ*(X). They have the following relationship. 



Theorem 1.8.3. For any source X = it holds that 

H{X) < iJ*(X) < min(S^(X), H{X)). (1.8.35) 

In particular, if X satisfies uniform integrahility, it holds that 

H{X) < iJ*(X) - H{X) < H{X) (1.8.36) 

(see Theorem 1.7.2, Remark 1.7.4 Remark 1.8.4)- 

Remark 1.8.5. From Theorem 1.3.1, Theorem 1.7.1 and Theorem 1.8.2, 
(1.8.35) and (1.8.36) can be expressed as 

H{X) < Rl{X) < mm{Rf{X),Ry(X)), (1.8.37) 

and 

^(X) < Rl{X) = Ry{X) < Rf{X), (1.8.38) 

respectively. □ 



Proof of Theorem 1.8.3. 

It is sufficient to prove 



H{X) < i7*(X) 



and 



iJ*(X) < min(iJ(X),iJ(X)). 



(1.8.39) 



(1.8.40) 



Since (1.8.40) is obvious from ( 1 . 8 . 12 ) and (1.8.30), we will prove (1.8.39) 
hereafter. Let 7 > 0 be an arbitrary small constant and define 






n Px" (x) 



> S(X) - 7 



Denoting 6^ = Pr G 5^}, the definition of S(X) implies that 

^ 0 as n 00 . (1.8.41) 



Now, fix a constant 0 < e < 1 arbitrarily and consider an arbitrary subset 
An C satisfying Pr {X'^ G An} > 1 — <s. Set 

Sn=Pr{X^ iAn}^ an-Pr{X"G^nn5^}. 

Then, it follows from (1.8.32) with Tn — that 



(1.8.42) 



62 



1 Source Coding 



H{X^\An) > ^ 

= (i- 

> (l - 3 ^) H{X^\Ar, n 5„), (1.8.43) 



where Sn < ^ and an < &n are used. On the other hand, since (1.8.42) implies 
Pr {X^ G An n 5'^} — 1 — Sn — an, 
we have 



yi^ArinSri 



■Px" (x) Px" (x) 

log 

1 (Xfi 1 £ji OCji 



(1.8.44) 



By using Px^^(x) < e for x G 5^, (1.8.44) is evaluated in the 

following way: 



H{X^\AnnSn)>n Y, 

xGA„n5„. 



Px"(x) 

1 



(^(X)-7) 



+ E 

xGAr,.n5r,. 



Px"(x) 

1 £n 



log(l - - a„) 



== n(F(X) - 7 ) + log(l - - a„) 

> n(P(X) - 7) + log(l - £ - 5n). 
Consequently, we obtain 



H{X'^\An) > n (1 - 3 ^^) (P(X) - 7) 

+ (1 - 3^“^ log(l - £ - ^n) 



from (1.8.43). By recalling that An C is arbitrary as far as it satisfies 
Pr {X^ G An) > 1 — we have 

i7r.(X^)= inf H(X^\An) 

^ ^ (1 - I^) 

+ (1 - 3-—^ log(l “ ^ 



which, together with (1.8.41), leads to 
limsup — iJ[£](X’^) > P(X) — 7 . 

n— >00 

Therefore, we have if*(X) > P(X) — 7 . Since 7 > 0 is arbitrary, we obtain 
P*(X) > F(X) by letting 7 ^ 0 . □ 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



63 



Remark 1.8.6. As can be seen from the proof above, we actually have 
H*(X) > limsup > liminf > HCK) 

n^oo n n->oo n 

for an arbitrarily fixed 0 < e < 1. □ 



Theorem 1.5.1 and Theorem 1.8.3 immediately yield the following corol- 
lary. 

Corollary 1.8.1. If a source X = satisfies the strong converse 

property, 

H{X) = F*(X) = H{X). (1.8.45) 

If X satisfies the uniform integrability in addition to the strong converse 
property, (1.8.45) can he written as 

H{X) - iJ*(X) = H{X) = H{X). (1.8.46) 

Remark 1.8.7. If a source X = is stationary and ergodic, (1.8.46) 

holds from Remark 1.7.5 and (1.8.45) even if X does not satisfy the uniform 
integrability. Thus we have 

Rl{X) = i?,(X) = Rf{X) = H{X). 

for stationary ergodic sources. □ 



1.9 Source Coding and Large Deviation: Decoding Error 
Probability 

In §1.3 and §1.6 we considered the fixed-length source coding under the re- 
quirement that the error probability Sn satisfies — > 0 or is asymptotically 

bounded by a constant 0 < e < 1 . In this section we consider the fixed-length 
source coding under a stronger requirement on the error probability. That is, 
we require the error probability Sn to asymptotically satisfy 

e-"" (1.9.1) 

for a given constant r > 0. In this fixed-length coding it is fundamental to 
consider how we can make the coding rate small subject to the constraint 
(1.9.1). Under this problem formulation we need to treat the source coding 
from the viewpoint of the large deviation theory. As readers will see below, the 
information-spectrum approach provides quite powerful methods for dealing 
with such a problem. 

We start with giving definitions. 



64 



1 Source Coding 



Definition 1.9.1. 

Rate R is r-achievable There exists an (n, £^)-code satisfying 

lim inf — log — > r and 
n^oo n Eji 

lim sup — log Mn < R 

n^oo 

Definition 1.9.2 (Infimum r-achievable fixed-length coding rate: 
Part I). 

Re(^|X) = inf {jR I R is r-achievable} . 



The objective of this section is finding this Re{r\X) as a (left-continuous 
and monotone increasing) function of r. In fact, the inverse of Re{r\X.) is 
called the reliability function of a source X. To this end, let X = 
be a general source and define 



a(R) = lim inf — log 

n— ^•oo Ti 



1 



Pr 




1 




(1.9.2) 



Clearly, (j{R) is a monotone increasing function of R. Note that, however, 
(t{R) is not continuous in general. 



Lemma 1.9.1. (j{R) — 0 for R < H(K). 



Proof If R < the definition of H{X.) implies that there are infinitely 

many n satisfying 



Pr 






log- 



1 



^ -R r ^ '^0 



Pxr>m 

for some 0 < < 1- This inequality immediately yields 

1 



-log 

n 



Pr 



log- 



1 



Consequently, we have 



> R 



1 1 1 
< - log—. 
n €o 



1 1 

cr{R) < lim inf — log — =0. 
n— ^oo n Eq 



This lemma means that R > R(X) must be satisfied for satisfying cr{R) > 
0. We have the following quite general theorem. Here, recall that A:* is a 
countably infinite source alphabet in general. 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



65 



Theorem 1.9.1 (Han [39]). For any r > 0 

Re{r\X.) = sup — (j{R) I cr{R) < r} , (1.9.3) 

R>0 

where i^e(OlX) = 0) ^*5 defined as 0. 

Remark 1.9.1. Note that cr{R) < r on the right-hand side of (1.9.3) is not 
cr{R) < r. This is an essential difference as can be seen from the following 
proof. □ 



Remark 1.9.2. While we have 

sup {R — (t{R) I cr(R) <r} — sup R (1.9.4) 

0<R<H(X.) 0<R<H(X.) 

from Lemma 1.9.1, sup on the right-hand side is attained at R = -f^(X). 
Therefore, sup on the right-hand side of (1.9.3) can be replaced with sup 

provided that (j{R) is continuous at R = ^^(X). □ 



Proof of Theorem 1.9.1. 

1) Direct part: 

Denote by T^(a) the set defined by 



Tn{a) 

Define 



|x € A'” 



n Fx" (x) 




(1.9.5) 



R — sup {R > 0 I (j{R) < r} (1.9.6) 

and set = \Tn{R-\-'y)\, where 7 > 0 is an arbitrarily small constant. Next, 
define an encoding function 



as a mapping that maps each element of Tn{R + 7) to a distinct element of 
A4n in the order of 1, 2, • • • and all elements not belonging to Tn{R + 7) to 
1. We define a decoding function 

'f’n ’ M-n — > Tn{R-\- 7) 

as the inverse mapping of Then, the error probability Sn of this 

coding is given by 

£„ = Pr{X"^T„(l + 7)} 



66 



1 Source Coding 



Thus, in view of the definition of cr(i^) we have 
liminf — log — = a{R -f- 7). 

TL ^00 77 / ^TL 



We notice here that a{R-\-'y) >r from ( 1 . 9 . 6 ). Consequently, we obtain 

( 1 . 9 . 7 ) 

Next, we evaluate M^. First, define 

R + j 



lim inf — log — > r. 

n— »-oo n Sji 



L = 



27 



Denote by 

/j = [2(j - 1)7, 217) = 

the L nonoverlapping subintervals in [ 0 ,i? + 7), each of which has a width 
27. We partition the set 



Tn(i^ + 7)-ixGA' 



0 < - log + 7 j 

n Px-(x) J 

into the following L subsets according to this partition of the interval 
{information- spectrum slicing): 



rp{i) 

r) 






xe 






C h 



(z — 1, • • • , L). 



Clearly, it holds that 
L 

Tn{R + l) = \jT^\ 



i=l 



Since 



P,{x«er»»}<Pr{il„g;^^>2(i-Ih}, 

have 

liminf ^ log p > a{ 2 {i — 1)7), 



n— >cxD n 



Pr{x« 



which implies 

Pr |X” € } < e-”l‘^(2(i-i)7)-7l (vn > no). 

On the other hand, since Px'*(x) > for any x € Tn \ it follows that 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



67 



g-n[a(2(i-lh)-7] > prjx” € TW I = Px"(x) 

> E g— 2ni7 |2~'(^) |g— 2ni7 



This shows that 

|7"(0| < g^[2n-o-(2(i-l)7)+7]_ 



Consequently, we have 



M„ = |r„(i? + 7 )| 

L 



^E 



^n[ 2 i 7 -o-( 2 ( 2 -l) 7 )+ 7 ] 



L 



gn[2(i-l)7-(T(2(2-l)7)+37] 



(1.9.8) 



■i=l 

We notice here that, since 2 (i — 1)7 G [ 0 , i 7 — 7] for alH = 1 , • • • , L, and hence 
(7(2(2 — 1)7) < r, it holds that 

2(2 - 1)7 - (j( 2(2 - 1)7) < po (2 = 1, • • • , L), 

where po is defined by 

Po = sup {R — (7{R) I (j{R) < r} . ( 1 . 9 . 9 ) 

R>0 

Therefore, ( 1 . 9 . 8 ) can be written as Mn < This inequality imme- 

diately yields 



lim sup — log Mji < Po + 37. 

n—^00 R 

The combination of this with (1.9.7) means that po + 37 is r-achievable (recall 
that 7 > 0 can be arbitrarily small). 



2 ) Converse part: 

Define R and po by (1.9.6) and (1.9.9), respectively. Since (t{R) is mono- 
tone increasing, there exists 0 < Rq < R such that 

lim(i?o - £ - cr{Ro - e)) po- (1.9.10) 

eiO 

Then, for any 7 > 0 

a{Ro - 7) < ^(^0 + 7) (1.9.11) 

must hold for the following reason. Assume that a{Ro — 70 ) > cf{Rq + 70 ) for 
some 7 o > 0. This leads to 



68 



1 Source Coding 



(j(i^o - To) = cr{Ro +7o) 



(1.9.12) 



because (j{R) is monotone increasing in R. On the other hand, Rq — jo < R 
follows from Rq < R, where R is defined by (1.9.6). Thus, a{Ro — 70) < r. 
Owing to (1.9.12), we obtain (j{Ro + 70) < r. Here, since (1.9.12) guarantees 
that for any 6: > 0 satisfying £ < 70 

^0 + To - cr{Ro + To) = i^o + To - (^{Ro ~ s) 



from (1.9.10). By using a{Ro + 70) < r, (1.9.13) leads to 
po = sup — cr{R) I (j{R) < r} 

> i^o + To - cr{Ro + To) 

== /^o +To, 

which is a contradiction because 70 > 0 is assumed. Thus, (1.9.11) must be 
satisfied. Now, set Iq = [Rq — 7 , + t) define 



By using the notation in (1.9.5) the probability of To can be expressed as 



We notice here that the definition of a {R) implies that 
Pr {X" ^ Tn{Ro + 7)} < (Vn > no) 

for an arbitrarily small r > 0. The definition of cr(T) also implies that 
Pr {X"’ ^ Tn,{Ro - 7)} > 

for some divergent sequence ni < n2 <•••—> 00. Therefore, we obtain 
Prjx^J G To} > 

We note that (1.9.11) means the existence of a sufficiently small r > 0 such 
that 

cr{Ro - 7) + r < cr{Ro + 7) - r. 

Consequently, we have 

Pr {X"^ G To} > (Vj > jo)- (1.9.14) 

Now, assume that T = po ~ 2A (A > 0) is r-achievable. That is, assume that 
there exists an (n, M^, £n)-code with an encoding function cpn and a decoding 
function satisfying 



— To — € — (j(Rq — s) (70 + s) 



by letting 5 | 0 we obtain 

To T 7o — cr(To -f 7o) = Po + To 



(1.9.13) 




Pr {X^ G To} = Pr {X" ^ T,(To - t)} - Pr {X" ^ T,(To + t)} • 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



69 



liminf — log — >r (1.9.15) 

n-^oo n Sn 

and 

lim sup — log Mn < R. (1.9.16) 

n^oo n 

We prove the converse part by showing that this assumption leads to a con- 
tradiction. Define 



^0 = {x e I X = ■ 

Since Px"(x) < for x € Tq, it follows that 

Pr{V”€^onTo}= 

xG AoPlTo 



< 



E 



g-n(i?o-7) 



xG AgHTo 

Notice here that (1.9.16) implies 



Mn < (Vn > no). 



By using |^o| < we have 
Pr{V" e Ao nTo} < 

_ g-n(i?o-7-Po+A) 

On the other hand, since po can be expressed as 



Po = Ro-l - (r{Ro - 7) + <^(7) 



from (1.9.10), where S{j) ^ 0 as 7 0 , we obtain 

Pr {X" G ylo n To} < (1.9.17) 

Consequently, it follows from (1.9.14) and (1.9.17) that 

=Pr{X”^ G^gj 

> Pr{X"^ G ^gnTo} 

= Pr {X"’ G To} - Pr {X”’ G ^0 C To} 

> l.g-nj(iT(Ro — i)+t) _ g-nj(a(Ro-j)+>>-S(j)) 

- 2 



where we choose sufficiently small r > 0 and 7 > 0 satisfying A — ^( 7 ) > r. 
Hence, 



11 11 
lim inf — log — < lim inf — log < a(Ro 



7 ) +T. 



(1.9.18) 



70 



1 Source Coding 



However, (1.9.18) contradicts (1.9.15) since, owing to a{Ro — 7) < r, we can 
choose sufficiently small r > 0 such that a{Ro — 7) + r < r. This shows that 
R> Po must be satisfied if R is r- achievable. □ 



Example 1.9.1. First, let us apply Theorem 1.9.1 to a stationary memory- 
less source X = subject to a probability distribution Px over Af, 

where A’ is a finite alphabet. Denoting by Tx the type of x G A'’^ (see the 
proof of Lemma 1.4.4 in §1.4), 



T^iR) = |x G A”" 
can be expressed as 



1 1 1 

~ p — TT - ^ 
n Px” (x) 



r„“(P) = X G 






Denote by P(A’) the set of all probability distributions over X and define the 
plane 



V{X) 






(1.9.19) 



in 'P(A'). We call the probability distribution Pr determined by 
inf D{P\\Px) = D{Pr\\Px) 



the projection of Px to ttr (see Fig. 1.13). The projection Pr uniquely de- 
termined for each R. From Sanov’s Theorem (cf. Bucklew [12], Dembo and 




Fig. 1.13. 



Zeitouni [22]) on the large deviation, we have cf{R) = 0 for < H{Px) and 
(j{R) — D{Pr\\Px) for H{Px) < This cf{R) is a monotone increasing and 
continuous function of R. Since H{jK) = H{Px), we have only to consider 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



71 



R satisfying R > H{Px) in (1.9.3) (see Remark 1.9.2). We notice here that, 
since Pr e ttr, (1.9.19) yields 

D{Pr\\Px)-\-H{Pr) = R, (1.9.20) 

which means that (j{R) + H{Pr) = R. Thus, we obtain R — cr{R) = H{Pr). 
Note that (1.9.20) and the definition of Pr lead to H{Pr) = sup H{Q). 

Then, we obtain the formula 

Re(r|X) = sup {H{Pr) I D{Pr\\Px) < r} 

R>0 

from Theorem 1.9.1. It is immediate to see that this Rg(r|X) can be written 
as 



Re(r|X) = sup H{Q). (1.9.21) 

Q:D{Q\\Px)<r 

Now, set R(r) == Re{r\X.) and define p{R) as the inverse of R{r) defined 

by 

p{R) = sup {r I Re{r\X.) < R} . (1.9.22) 

Then, p{R) is expressed as 
p(R)= M D{Q\\Px) 

Q:H{Q)>R 

from (1.9.21) (Fig. 1.14). This is the formula giving the supremum of 

lim inf — log — 

n^oo n £ji 

with respect to all (n, M^, ^n)-codes satisfying lim sup — log < R. This 

n^oo R 

formula was first developed by Longo and Sgarro [62] based on the argument 
of the types that is completely different from the argument given here. We 
also note that p{R) = 0 for R < H{Px)- □ 



Example 1.9.2. Let Af be a finite alphabet and suppose that a source 
X = (Xi, X 2 , • • •) is a first-order stationary irreducible Markov source subject 
to a transition probability P{x 2 \xi) = Pr{X 2 = X 2 \Xi = xi} (xi,X 2 G X). 
Denote by P(X x X) the set of all probability distributions over X x X. For 
any Q G P{X x X) denote by 



Q{xi) = 



Q{X2\xi) = 



Q{xi,X2) 

q{xi) 



72 



1 Source Coding 



p{R) 




Fig. 1.14. 



the marginal distribution and the conditional probability distribution, re- 
spectively, and define the conditional entropy and the conditional divergence 
by 

H{Q\q) = Y. q{x{)H{Q{-\x^)), (1.9.23) 

ail 

D{Q\\P\q) = ^ 9 (xi)D(Q(-|xx)|lP(-|xi)), (1.9.24) 

ail 

respectively. Let Vo denote the set of all Q G 'P(A:' x A') with stationarity 
^ Q(x,a:') = ^ Q(x',x) {yx' € X). 

aiGA' 

Furthermore, define the (conditional) entropy of the stationary irreducible 
Markov source X by 

H{P\p) = ^p(x)ff(P(-|x)), 

where p(-) denotes the stationary distribution of P(-|*)- If we apply Sanov’s 
theorem (cf. Dembo and Zeitouni [22]) for stationary irreducible Markov 
sources similarly to Example 1.9.1, we have cr{R) = 0 for < H{P\p) and 

a{R) = D{Pr\\P\pr), (1.9.25) 

R-a{R) = H{PR\pR) (1.9.26) 

for R > H{P\p), where Pr G Vq denotes the projection of P to the plane 



TTi? 



Q ^ Vo 



Y <5(^1 >^ 2 ) log 

3ii ,ai2 G<^ 



1 

P{X2\xi) 




(1.9.27) 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



73 



defined by 

ini D{Q\\P\q)=D{PR\\P\pR), 

QeTTR 

and pr denotes the marginal distribution of Pr. Here, notice that in (1.9.3) 
we have only to consider R satisfying R > H{P\p) since H{X.) = H{P\p) 
for the stationary irreducible Markov source X (see Remark 1.9.2). From 
Theorem 1.9.1 we obtain 

Re(r|X) - sup {H{Pr\pr) \D{Pr\\P\pr) < r} 

R>0 

= sup H{Q\q), (1.9.28) 

QeVo:DiQ\\P\q)<r 

which is a generalization of the formula obtained in Example 1.9.1. 

Now, set R{r) = Re(r|X) and define p{R) as the inverse of R{r) in the 
same way as (1.9.22). Then, (1.9.28) implies that p{R) can be written in the 
following form: 

p{R) = inf D{Q\\P\q). 

QeVo:H(Q\q)>R 

This is the formula giving the supremum of 
lim inf — log — 

n^oo n £n 

with respect to all (n, M^, £n)-codes satisfying lim sup — logM^ < R for the 

n— >co Ti 

irreducible Markov source. This formula was first developed by Davisson, 
Longo and Sgarro [21] based on the argument of the Markov types that is 
completely different from the argument given here. See Natarajan [72] for 
another derivation of this formula. □ 



Example 1.9.3. Let us further generalize Example 1.9.2. First, let and 
be a source alphabet and a state alphabet, respectively, where both A' and 
S are assumed to be finite sets. Let a source X = = (Xi, X 2 , • • • , X^)} 

be a unifllar finite-state source subject to a probability distribution 

-Px" (x) = P(x|s) 

n 

= llP(a:;i|si) (x = (xi, 2 : 2 , • ' ' , a:n) € A””). (1.9.29) 

2 = 1 

Here, P(x|s) (x G G 5) denotes the (conditional) probability that the 
source generates x G X under a state s e S and Si e S means an arbi- 
trary fixed initial state known to an encoder. Suppose that a state transi- 
tion function f : x S ^ S sequentially determines a sequence of states 

s = (5i, 52, • • • , Sn) from the initial state si G by 

Si^i — f{xi^ Si) (i = 1, 2, • • • , 71 1). 



74 



1 Source Coding 



Then, it is easy to see that a sequence of states (si, 52 , ■ • • , Sn-\-i) is deter- 

mined from a sequence of source outputs (xi, X 2 , • • • , x^) under a fixed si e S 
(the term “unifilar” originates from this property). Now, denote by the 
set of all states that are reachable from the initial state si with “positive 
probabilities” and set 

S' = f{X,S), 

where (X, S) is an arbitrary random variable taking values in Af x <Sq. Let Uq 
denote all of the joint probabilities Pxs{‘, *) satisfying the stationary condi- 
tion Ps'{-) = Ps{') and their probability transition matrices P 5 '| 5 (-|-) being 
irreducible. We define the projection PxrSr C Uq of P(-|*) to a plane 



= \ Pxs C Uq 









by 

inf_ D{Pxs\\P\Ps) = D{PxrSr\\P\Psr)- 

PXS^^R 

Then, similarly to Example 1.9.2, application of Sanov’s theorem for unifilar 
finite-state sources (cf. Han [37]) yields 

a{R) = D{Px^Sn\\P\Ps^), (1-9.30) 

R - a{R) = H{Px^Sn\Ps^)- (1-9-31) 



Therefore, due to Theorem 1.9.1, we obtain the following formula for the 
unifilar finite- state source X; 



Re(r\X) = sup {F(Px„s„|Psh) I D{PxkSr\\P\Psr) < r} 

R>0 

— sup H{Pxs\Ps)- (1.9.32) 

Pxs^^o-D{Pxs\\P\Ps)<r 

Here, set R{r) = Re{r\X.) and define the inverse of p{R) in the same way 
as (1.9.22). Then, (1.9.32) implies that p{R) can be written in the following 
form: 



p{R)= inf D{Pxs\\P\Ps)- 

QeUo-.H{Pxs\Ps)>R 

This is the formula giving the supremum of 

lim inf — log — 

n^oo n Sn 

with respect to all (n, M^, £y^)-codes satisfying lim sup — logM^ < R for the 

n— >-oo R 

finite-state source (see Merhav [66] for the case of the variable- length coding). 

Note that the unifilar finite-state source defined above is a (nonergodic) 
“mixed source” of the asymptotically stationary (or asymptotically periodic) 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



75 



irreducible sources in general. This is the reason why we assumed that the 
probability transition matrix Ps"| 5 (-|-) is irreducible (cf. Han [37]). Here, the 
irreducible source means the source not decomposed into the form of the 
mixed source given in (1.4.1) in §1.4. 

In particular, a “stationary irreducible source” with a countably infinite 
alphabet X actually means the “stationary ergodic source” that we have 
already described (cf. Gray and Davisson [33]). 

If we consider the infimum achievable fixed-length coding rate Rf(X.) (see 
Definition 1.1.2 in §1.1) for this type of unifilar finite-state source X, it is easy 
to verify that the following formula holds: 

Rf{X) = sup_ H{Pxs\Ps). (1.9.33) 

Pxs^l^o 

where Uq denotes the set of all joint probability distributions Pxs '(-, ') 
satisfying = P(-|-) and we use the same notation as (1.9.32). □ 



Example 1.9.4. Suppose that T is a finite alphabet again and consider 
the mixed source X given in Example 1.4.1. First, we partition 

V{X), the space of probability distributions over X, into the two half-spaces 
as follows: 



Pi 



|q e V(X) 



^ Q{x) log 

aiGA’ 



Pijx) 

P2{x) 




1^2 



Q e V{X) 



^ Q{x) log 
xex 



P2{x) 




(1.9.34) 



(1.9.35) 



Denoting by Xi = and X 2 = {X 2 two stationary memoryless 

sources subject to probability distributions Pi and P 2 , respectively, the mixed 
source X = has the probability distribution 



Px^'-(x) = aiPx{''(x) -h a 2 Px^''(x) (Vx G X'^) (1.9.36) 

from its definition. We notice here that by using the type Tx of x G X'^ Px^ 
and Pxji can be expressed as follows: 

Pxr(x) =exp[-n(D(Txl|Pi)+P(rx))], 



Pxj(x) = exp[-n(D(T,l|P2) + H{T^))], 

respectively (cf. Csiszar and Korner [19]). These equations mean that 

Px-(x) > Px’*(x) if Tx e i/i, (1.9.37) 

T5ci‘(x) < Px"(x) if Tx € (1.9.38) 

Hence, it follows from (1.9.36) that 



76 



1 Source Coding 



(1.9.39) 

(1.9.40) 



q;iPx:*(x) < < -Px--(x) ifTx e 

Q!2-Px«(x) < Px"(x) < Px'‘(x) ifPx e l^2- 
Now, define the two half-spaces of P(A') by 



.(1) 



4') = <; Q e V{X) 



Q € V{X) 



|;<3(x)iog^>/!|, 



and denote by and the projections of Pi and P 2 to Vi fl and 
n7T^\ respectively. By taking (1.9.39) and (1.9.40) into consideration and 
applying Sanov’s theorem, we obtain 



a{R) = min(P(P«||Pi),P(Pf IIP 2 )). 



(1.9.41) 



We can compute values of Pg(r|X) as a function of r by substituting this 
(j{R) into the right-hand side of (1.9.3) in Theorem 1.9.1. 

Here, it is obvious from (1.9.41) that cr(P) = 0 for P < max(P(Pi), H{P 2 )) 
and cr(P) is a monotone increasing and continuous function of R. Therefore, 
we have 



Pe(r|X) > max(P(Pi), P(P 2 )) (Vr > 0). (1.9.42) 

On the other hand, since (1.9.41) implies that (r{h) > 0 for an arbitrary rate 
h satisfying h > max(P(Pi), P(P 2 )), it holds that 

sup {R — a{R)\a{R) < a{h)} < /i, 

R>0 

which means that h is cr(/i)-achievable. Consequently, we obtain 

limPe(rlX) = max(P(Pi),P(P 2 )). (1.9.43) 

r|0 

Remark 1.9.3. In fact. Example 1.9.4 can be generalized in a much simpler 
form without computing the information-spectrum. That is, we actually have 
the formula 



Re{r\X) = max(Pe(r|Xi), Pe(rlX 2 )) (Vr > 0) (1.9.44) 

for the mixed source X = of two general sources Xi = 

and X 2 = {X 2 }^i defined by (1.9.36) with a countably infinite alphabet 
A'. This formula can be verified in the following way. First, we arbitrarily 
choose Pi and P 2 satisfying 

Pi > Pe(r|Xi), P 2 > Pe(r|X 2 ). (1.9.45) 

Then, the definitions of Pe(r|Xi) and Pg(r|X 2 ) guarantees that there exist 
an (n, ^)-code for the source Xi satisfying 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



77 



lim inf ^ log > r, lim sup — log < R\ 



n ^•oo Tl ' n — >^00 ^ 

f(2) J2). 



(1.9.46) 



(1.9.47) 



and an Mn \sn )-code for the source X 2 satisfying 

11 1 
lim inf — log > r, lim sup — log < R 2 . 

n >00 77 / ^ n — >^oo 

Denote by {<<Pn\'^n^) and the pairs of an encoder and a decoder 

corresponding to these codes. By setting 

BW = {x g < mW), 

B(2) = {x € A-^lx < M(2)), 

Br, = B(i) U B?) 



and noticing 

|B„| <M„ = MW+M(2), 



(1.9.48) 



we can construct an (n, M„, e„)-code ipn) for the mixed source X satis- 
fying 



B„ = {x e d::’"|x = V'„((/?„(x))}, 
£„ = Pxn(B);). 



(1.9.49) 

(1.9.50) 



For example, we can define an encoder (/?„ as the mapping satisfying </?„(x) = 
(^n^(x) for X S Bn'' and V?„(x) = ipn\x) + Mn'' for X € Bn^ - Bn^ {(fin 
is one-to-one on Bn) and a decoder ipn as the inverse mapping of {tpn is 
one-to-one on <pn(Bn))- Then, since it follows that 

en = aiPx-{B^n) + o^2Pxs{B^n) 

< aiBxj^((Bii))^) -h a2Bxj.((B(2))^) 

= + a2S^n'' 

< max(4^\4^)), 
we obtain 

lim inf — log — > r 

n— >00 n £n 



(1.9.51) 



from the first inequalities in (1.9.46) and (1.9.47). On the other hand, (1.9.48) 
and the second inequalities of (1.9.46) and (1.9.47) yield 



lim sup — log Mn < max [ lim sup — log , lim sup — log 

n— )-oo \ n-^00 ^ n— >cx) ^ 

< max(i7i, 772 ). (1.9.52) 

By summarizing (1.9.51) and (1.9.52), we have 



78 



1 Source Coding 



i^e(r|X) < max(i7i, 772 ). 



(1.9.53) 



Equation (1.9.53) means 



Re{r\X.) < max(i^e(r|Xi),i7e(r|X2)) 



(1.9.54) 



because Ri and R 2 are arbitrary as far as they satisfy (1.9.45). 

Next, in order to develop the inequality in the opposite direction, suppose 
that R is an arbitrary r-achievable rate for the mixed source X. That is, 
suppose that there exists an (n, M^, e^)-code satisfying 

11 1 

liminf — log — > r, limsup - logM^ < R. (1.9.55) 

n >00 Ti £^i n — >oo 



Denote by and 'ij^n the encoder and the decoder corresponding to this 
code. Setting 



= {x € = 'Ipi^ix))} 



we have 



= PxAB^,) = + a2Pxy (B^). 



Therefore, 



4« s Px;{Ba < 


(1.9.56) 




(1.9.57) 


Then, the first inequality in (1.9.55) leads to 




lim inf — log > r, 

n^oo Tl 


(1.9.58) 


lim inf — log > r. 

n—^00 Ti 


(1.9.59) 



Combining these inequalities with the second inequality in (1.9.55), it turns 
out that the rate R is r-achievable for the sources Xi and X 2 under the code 
Hence, we obtain 

R > max(i7e(r|Xi), Re{r\X. 2 )). (1.9.60) 

Since R is an arbitrary r-achievable rate for the mixed source X, (1.9.60) 
implies 

Re{r\X) > max(i7e(r|Xi), i7e(r|X2)). (1.9.61) 

This completes the proof of (1.9.44). 

If we consider the mixed source in Example 1.9.4 as a special case of 
the formula (1.9.44), we have the following simple formula from (1.9.21) in 
Example 1.9.1: 

i7e(r|X) = max [ sup H{Q), sup H{Q) j . (1.9.62) 

\Q:DiQ\\Pi)<r Q:D{Q\\P2)<r J 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



79 



Example 1.9.5. Let us consider the case that in Remark 1.9.3 Xi == 
{Xf and X 2 = {-^2 }^i first-order stationary irreducible Markov 
sources with a finite alphabet X subject to transition probabilities Pi (-I*) and 
P2(-l-)- formula 

Re{r\X) = max(Pe(^|Xi),Pe(r|X2)) 

== max sup H{Q\q)^ sup H{Q\q) ) 

\QeVo-.D{Q\\Pi\q)<r QeVo:D{Q\\P2\q)<r J 

for the mixed source X of Xi and X 2 , which is obtained by substituting 
(1.9.28) in Example 1.9.2 into (1.9.44) in Remark 1.9.3. Similarly, for a unifilar 
finite-state source X we have 

Pe(r|X) - max(Pe(r|Xi),Pe(r|X 2 )) 

= max sup H{Pxs\Ps), 

\PxseU^^^-.D{Pxs\\Pi\Ps)<r 

sup P(Px5|P5)| 

PxseU^‘^^:D(Pxs\\P2\Ps)<r J 

from (1.9.32) in Example 1.9.3. Here, Pi(x|5) and P 2 {x\s) (corresponding to 
P{x\s)) denote the probability oi x e X given a state s e S and and 
(corresponding to Uq) denote the set of all joint probability distributions 
Pxsi’: •) satisfying the stationarity and the irreducible conditions under state 
transition functions fi and /2 (corresponding to /). □ 



Example 1.9.6. Now, let us consider a source with a countably infinite al- 
phabet X = {1, 2, • • • , }. Though Sanov’s theorem, used in Examples 1.9.1- 
1.9.5, does not always hold from the computational point of view, we can 
use Cramer ^s Theorem (cf. Dembo and Zeitouni [22]) on the large deviation, 
which always holds. First, fix a probability distribution P = {vi^V 2 r ’ ') over 
X arbitrarily and denote by X the random variable equal to k with probabil- 
ity pk for each /c = 1, 2 , • • •. Denoting by X = {X^ == (Xi, X 2 , • • • , Xn)}^=i 
the stationary memoryless source specified by X, the entropy density rate is 
written as 

n Pxr. {XV Px, (Xi) ' 

Thus, cr(P) in (1.9.2) can be expressed as 

o-(-R) = inf -fW, 

x>R 



(1.9.63) 

(1.9.64) 



where I{x) denotes the large deviation rate function of (1.9.63). Cramer’s 
theorem tells us that the rate function I{x) is given by 



80 



1 Source Coding 



I{x) = sup{6x — A{6)), (1.9.65) 

e 

where A{0) = logM(^) and M{6) denotes the moment generating function 
of log defined by 



M{9) = Ee' 



e log 



PxW 



= E 



Pie 









(1.9.66) 



i=l 



(here, log M{6)I0 is usually called the Renyi 9 -entropy (cf. Csiszar and Korner 
[19])). Notice that (1.9.64) implies that o-{R) = 0 for < H{P) and cr{R) = 



I (R) for > if (P) because the expectation of log 
as follows: 



PxiX) 



equals the entropy 



E 



log 



Px{X)\ 



OO ^ 

= Vpilog- = F(P) 

Pi 



(7(x) is monotone increasing for x > if(P), monotone decreasing for x < 
H{P) and I{x) = 0 at x = H{P).) Accordingly, by substituting this into 
(1.9.3) in Theorem 1.9.1, we obtain the formula giving values of Pg(r|X). 

If we substitute (1.9.66) into (1.9.65), setting x = R, we have 



I{R) = sup ( 6R — log / 



OO 



(1.9.67) 



We can compute I{R) by using this equation. To this end, we differentiate 
the right-hand side with respect to 0 and set it to 0. Then, we obtain the 
following equation in 6: 



R = 



OO .. 

i=i 



E^"i" 

i=l 



= ip{8). 



(1.9.68) 



Since ip{6) on the right-hand side turns out to be a monotone increasing and 
continuous function of 9 from the term-by-term differentiability of M (9) (cf. 
Dembo and Zeitouni [22]) as is easily verified from Schwarz inequality (cf. 
Gallager [30]), V defined by P = {-oo < (p{9) < -t-co | 9} forms an interval 
on the real line. Therefore, if P G P, I{R) can be computed as 

oo 

I{R) = 6R-\ogY,p\-\ (1.9.69) 

i=l 

where 9 is determined by (1.9.68). In this case, it is easy to check that 
7(P) = P(g^||P), (1.9.70) 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



81 



Qnii) 






(1.9.71) 



for 0 satisfying the equation (1.9.68), where V{?^) denotes the set of all 
probability distributions over A' and Qr means the projection of P to the 
plane 






oo 

^Q(2)log- = i? 

Pi 



i=l 



Thus, for the case of R eV Cramer’s theorem is reduced to Sanov’s theorem 
used in Example 1.9.1 (finite alphabet case). However, for R satisfying R 
an equality such as (1.9.70) does not hold. 

We note here that, if A is a finite source alphabet, (1.9.68) tells us that 




where 



Psup == sup Pi, pinf = . inf Pi. 

i i:pi>0 



On the other hand, if A is a countably infinite source alphabet, we have 
V (log log — = oo) 
in general. 

Hence, it is important to know what kind of interval V forms. In partic- 
ular, we can show that 

(log^,If(P)] CP (1.9.72) 

always holds provided that i7(-P) < +oo. Generally speaking, if P = 
(pi,P 2 ,---) satisfies pi ^ 0 as i ^ oo faster than with a certain speed, 
we have 




and otherwise we have 




We will see these two situations in the following two examples. □ 



Example 1.9.7. Let us consider the case that P = (pi,P2r'') in Exam- 
ple 1.9.6 is the geometric distribution 

Pi — (1 - (0 < a < 1; z — 1, 2, • • •). 



82 



1 Source Coding 



This is a typical example of probability distributions with rapidly going to 
0 as z ^ oo. The entropy is written as H{P) = ■ . Simple computation 

tells us that for this geometric distribution we have 

logM(^) — - 9) log(l - a) - log(l - o}~^) (yO < 1), 



which implies that 
V = (log^— ,oo). 

V Psup ^ 

Hence, if X is subject to the geometric distribution, (1.9.70) always holds. 
This means that the formula (1.9.21) given in Example 1.9.1 and developed 
by using Sanov’s theorem is still valid, where we use P instead of Px- ^ 



Example 1.9.8. Let us consider the case that P = (pi,P2r’0 Exam- 
ple 1.9.6 is given by 



Pi 



{i + l)(log(i + 1))4 






(1.9.73) 



where c > 0 denotes the normalization constant. This is a typical example of 
a probability distribution with pi very slowly going to 0 as z ^ oo. Since we 
have 






J convergent (if ^ < 0), 
\ divergent (if ^ > 0), 







convergent (if ^ < 0), 
divergent (if ^ > 0), 



V can be expressed as 
V = {log— ,H{P)]. 

Psup 

We notice here that the entropy of the probability distribution (1.9.73) is 
finite. By simple computation it turns out that 



4-00 



(if R < log ^ 

^ ^ Psup ^ 

{ monotone decreasing ^if log < R < H{P )^ , 



Psup 

(iiR>H{P)). 



That is, if the probability distribution is given by (1.9.73), (1.9.70) does not 
hold for R > H{P). Hence, in this case we cannot reduce Cramer’s theorem 
to Sanov’s theorem. In addition, we have (t{R) = 0 for all P, which shows 
that Pe(r|X) = oo for all r > 0. This means that, as far as we consider 



1.9 Source Coding and Large Deviation: Decoding Error Probability 



83 



a code of size Mn that grows in exponential order of block length n, there 
is no (n, Mn, Sn)-code with the error probability Sn converging to 0 in the 
exponential order of n. This kind of phenomenon never occurs if ^ is a finite 
source alphabet. □ 



Example 1.9.9. Let us define a source 



X = {X” = (Xi,X2,---,V„)}~i 



by using the autoregressive process (Xi,X 2 , • • •) generated according to 

Xji = aXn-i + Wn (0 < a < 1; n = 1, 2, • • •), 



where Xq = 0 and (kFi, W 2 , • • •) is the stationary memoryless process subject 
to a probability distribution Pw over a finite alphabet W. Here, W is a finite 
set of real numbers. Since there is a one-to-one correspondence between X'^ = 
(Xi, • • • , Xn) and — (LPi, • • • , Wn), the information-spectrum of 



coincides with the information-spectrum of 

1 , 1 1 1 

n Pw^> {WV « ^ Pw, (Wi) • 



(1.9.74) 



(1.9.75) 



Notice that each term on the right-hand side of (1.9.75) is independent 
and identically distributed. By applying Sanov’s theorem similarly to Ex- 
ample 1.9.1, it turns out that i?e(r|X) is given by 



Re{r\X) = sup H{Q), 

Q:D{Q\\Pw)<r 

where Q denotes a probability distribution over W. 

It is important to note that, even if W is a countably infinite alphabet, 
we can actually compute values of Re{r\X.) for an autoregressive process 
X = (Xi,X 2 , ’ ■ ■) hy using an argument based on Cramer’s theorem similar 
to Examples 1.9. 6-1. 9. 8. □ 



Example 1.9.10. Let us consider the general source Z = subject 

to the geometric distribution 

Pz^ii) - (1 - an)ai^ {i = 0, 1, 2, • . •), (1.9.76) 

where 



an - 1 - 



(a > 0). 



84 



1 Source Coding 



Note that this source is not a stationary memoryless source, a mixed source 
or a stationary irreducible Markov source treated in Examples 1.9. 1-1. 9. 8. In 
this case, simple computation tells us that 

/ ON _ / 0 for 0 < -R < ce, 

~ \R-afov R>a. 

Therefore, we obtain 

Re{r\Z) — sup {R — (j{R) \ cr{R) < r} . 

R>0 

= a (Vr > 0). 



Example 1.9.11. In all the examples given in this section, cr{R) is a contin- 
uous function of R. However, we can construct an example with a discontin- 
uous a{R). Let a source alphabet be A' = {0, 1} and define a subset Sn C 
satisfying |Rn| = 2^^ arbitrarily, where a is a constant satisfying 0 < a < 1. 
Furthermore, fix Xq,xi G with xq 7^ xi arbitrarily. Let us consider 

the general source X = whose probability distribution is given by 



Px-(x) 



2-2an XG5n, 

2-San X = Xi, 

1 _ 2-^^ - 2-3^^ for X - Xo, 

0 for X ^ U {xi,Xo}. 



(1.9.77) 



Here, it is clear that Px^^{Sn) = From simple computation the entropy- 

spectrum turns out to be the three-point spectrum with peaks at — - log(l — 
2 -otn _ 2 -^oin^^ 2a and 3a of probabilities 1 2~^'^ and 2“^"’^, 

respectively. Thus, cr{R) can be computed as 



a{R) = 



0 for R < 0, 
a for 0 < < 2a, 

3a for 2a < R < 3a, 
+00 for 3a < R 



(1.9.78) 



from its definition. Then, R — a{R) is expressed as 



R - a{R) 



R for < 0, 

R — a for 0 < < 2a, 

— 3a for 2a < i? < 3a, 
—00 for 3a < R. 

V 



From Theorem 1.9.1 we obtain the formula 



Re{r\X) 



a for r > a, 

0 for 0 < r < a. 



(1.9.79) 



(1.9.80) 



We note here that for r > a, sup on the right-hand side of (1.9.3) is attained 

R>0 

dit R = R° = 2a as follows: 



1.10 Source Coding and Large Deviation: Probability of Correct Decoding 



85 



sup {R - a{R) 1 a{R) < r} = R° - (t{R^) {R^ = 2a) 

R>0 

= a. 

Therefore, if we consider the case of r > 3a, sup is not attained at the 

R>0 

boundary R = sup{i^lcr(i^) < r} = 3a of {i^|cr(i^) < r} but is attained at its 
interior point R — R° = 2a. This phenomenon is peculiar to source coding 
of “general sources” not satisfying the consistency condition. We cannot find 
this phenomenon as far as we treat ordinary sources given in the preceding 
examples. 



1.10 Source Coding and Large Deviation: Probability of 
Correct Decoding 

As was shown in Theorem 1.3.1 in §1.3 (fixed-length coding) , if a rate R 
satisfies R < any (n, M„, £„)-code satisfying limsup — logM„ <R 

n^oo R 

cannot satisfy lim = 0. In addition, the proof of Theorem 1.5.1 in §1.5 

n-^oo 

(strong converse theorem) tells us that, if R becomes small and satis- 
fies R < ^(X), we have lim Sn = I for any (n, £n)-code satisfying 

n— >oo 

lim sup — log Mn < R. In such a source coding problem with a low rate, we 

n^oo R 

need to analyze the large deviation behavior of 1 — Sn, the probability of 
correct decoding, rather than that of the error probability £n that has been 
developed in the preceding section. Accordingly, in this section we describe 
the fixed-length source coding required to satisfy 

(r>0) (1.10.1) 

instead of (1.9.1). It is fundamental to analyze how we can make the coding 
rate small subject to the constraint (1.10.1). The information-spectrum 
approach plays an essential role in this section as well as in the preceding 
section. 

First, we give two definitions corresponding to Definition 1.9.1 and Defi- 
nition 1.9.2, respectively. 

Definition 1.10.1. 

Rate R is r-achievable There exists an (n, M^, e^)-code satisfying 

hm sup — log < r and 

n^oo R 1 

lim sup — log Mn < R. 

n—^oo R 



86 



1 Source Coding 



Definition 1.10.2 (Infimum r-achievable fixed-length coding rate: 
Part 2). 

i^*(r|X) = inf {i^ I is r-achievable} . 



The objective of this section is finding this jR*(r|X) as a (right-continuous 
and monotone decreasing) function of r (the inverse of i?*(r|X) as well as 
the inverse of i^e(r|X) (Definition 1.9.2) are called the reliable function of a 
source X). To this end, let X = be a general source and define 

(7* (R) = lim — log — 

n— )-oo n 

Pr 






log 



PxAXr^) 



< 



This (J*{R) can be regarded as a dual counterpart of the function cr{R) in 
(1.9.2). However, notice that the right-hand side of (1.10.2) is assumed to 
have a limit. Clearly, cr*{R) is a monotone decreasing function of R. The 
following lemma shows the domain of R satisfying c*{R) = 0. 



Lemma 1.10.1. cr*(i^) = 0 for R > 



Proof If R> the definition of :^(X) implies that there are infinitely 

many n satisfying 

for some 0 < Sq < 1. Therefore, we obtain 
cr*(i?) < lim inf — log — = 0. 

n— >oo n €q 



This lemma tells us that R < must be satisfied for satisfying cr*{R) > 0. 

We have the following theorem that is a dual part of Theorem 1.9.1. 

Theorem 1.10.1 (Han [39]). Assume that the limit in (1.10.2) exists. Then, 
for any r >0 ^ 



i^*(r|X) - inf |/i>0 

where [x]+ = max(x,0). 

Remark 1.10.1. Since 






(1.10.3) 






from Lemma 1.10.1, inf on the right-hand side is attained at R — S(X). 
Therefore, if cr*{R) is continuous at R — S(X), inf on the right-hand side 

of (1.10.3) can be replaced with inf . □ 

0<R<H{X) 



^ Recently, another expression for Rg(r|X) was given by Iriyama [54] 



1.10 Source Coding and Large Deviation: Probability of Correct Decoding 



87 






(1.10.4) 



Proof of Theorem 1 . 10 . 1 . 

1) Direct part: 

Denote by T* (a) the set defined by 
First, set 

R* = inf {i? > 0 I < r} . 

Denote by 

the subintervals in (^* — +oo), each of which has width 7, where 7 > 0 

is an arbitrary small constant. We partition the set {T*{R* — into the 
following sets according to the subintervals {information- spectrum slicing): 

T 

Setting <ii = + ^7 — for simplicity, ( 1 . 10 . 4 ) can be written as 

— {di 7? di] (i = 1, 2, • • •). 

We first evaluate Pr G Notice that Pr G Tn^^j can be ex- 

pressed as 

Pr {x” e } = Pr {X” € T*(d,)} - Pr {X” e - 7)} . 

Since we have 



a*{di) — lim — log 

n— »oo n 



Pr 



{s 



log 






<di 



1 



(T*{di-'f)= lim -log 

n— >00 n 



1 



Pr 






log- 



1 



Px»(X") 

from the definition of a* (R) , it follows that 
Pr 



<di-j 



|X" € T^} > (Vn > no), (1.10.5) 

where {rn} is an adequate sequence of positive real numbers satisfying — > 0 

as n ^ oo. By setting 



^7 — — f 5 2, • 



"(di -7) > cr*(di)}. 



we have 



88 



1 Source Coding 



cr*(di) < cr*{di - 7) 

for i ^ Cry ^ which enables us to arbitrarily choose a sufficiently small constant 
(5 > 0 satisfying 

cr*((ii) + Tn + < cr*(di - 'j) - Tn 

for sufficiently large n. Accordingly, (1.10.5) yields 

Pr G (Vi € >C^; Vn > no). (1.10.6) 



Similarly, we obtain 

Pr |X" € T^} < (Vi = 1,2, • • • ;Vn > no). (1.10.7) 



Since Px” (x) > e for x € (1.10.7) implies that 

\T(i)\ < (1.10.8) 

In addition, by using Px"(x) < for x € T»l*\ it follows from (1.10.6) 

that 

(Vi e £.^;Vn > no). (1.10.9) 

Now, define 



ho = inf > 0 I |nf^ [a*{R) + [i^ — cr*{R) — h]~^} 

Then, there exists an /ii satisfying hi < ho -\- S and 
inf + [i? — cr*{R) — ^ '^5 




( 1 . 10 . 10 ) 



( 1 . 10 . 11 ) 



where ^ > 0 is an arbitrarily small constant. Furthermore, since cr*{R) is a 
monotone decreasing function of R, there exists an Ri such that 

lim((j*(-Ri T <s) -j- [Ri s — (j*(i^i T ^) — ^i]"^) ^ (1.10.12) 

eio 



Denoting by infimum of such i^i, (1.10.12) leads to limcr*(^]^ 

ej.0 



which means that 



+ £) < r. 



(7*{Ri + £) < r 

for any e > 0 because cr*(jR) is a monotone decreasing function of R. This 
guarantees R^ > R^. Then, there exists an i satisfying Ri ^ R. Denoting by 
io such i, io ^ Cry must be satisfied. Otherwise, since we have 

a^{R) = a%R,) {R, - j <^R < R,), 

(1.10.12) and setting Ri = — 7 holds. However, Ri — — j < R^ 

contradicts the fact that R^ is defined as the infimum. 



1.10 Source Coding and Large Deviation: Probability of Correct Decoding 



89 



Here, we first have 

|J^(^o)| < ^n{diQ-a*{di^)-\-Tr,) 

from (1.10.8). Set Mn = and consider the following (n, £^)-code. If 

(1.10.13) 

we define the set C as Sn = since is satisfied. 

Otherwise, we define an arbitrary subset of of size 

min(Mn, |T^'°^|) 

as Sn- In this case, since we obtain 

|7;(io)| > lgn(/H-7-2r„) ^ < M„ 

by substituting hi < di^ - cr*{di^) +r^ into (1.10.9) and setting z = zq G Cj, 
we note that 

|5„| = min(M„, < M„ (1.10.14) 

holds. Now, we define the encoding function (pn : Mn = {1, 2, • • ■ , Mn} 

that maps each element of Sn to a distinct element of Mn in the order of 
1, 2, • • • and the other elements to 1 G Mn- The decoding function 'ijjn - Mn 
is defined as the inverse of Pn\sn’ Then, this (n, M^, Sn)-code satisfies 

l-£, -Pr{X" = ^n(^n(X-))} 

= Pr{X^ G Sn}- (1.10.15) 

First, consider the case that (1.10.13) does not hold. Recalling that Px^ (x) > 
e~ndio fQj. X G Tn^\ in this case it follows from (1.10.14) that 



Pr{X” €Sn}= 

x€5„ 

= \ Sn \ e -^^'0 

> l„-n(di -/ii+ 7 + 2 t„) 

- 2 

On the other hand, for the case that (1.10.13) holds, we have 



(1.10.16) 



Pr{X" G 5„} > 



(1.10.17) 

from (1.10.6) with setting z = zq G C^. From (1.10.16) and (1.10.17) we have 
Fl{X'^ G Sn} > -e~^^"^*^^^o)Hdio-(^*idio)-^rr,-hi] + -^j-\-Tr,.) ^ (1.10.18) 



90 



1 Source Coding 



which is satisfied in either case. We note here that 




because 



[dio - (T*{di„) + - /li]+ < [dip - cr*(dij - /),i]+ + Tn 

is satisfied. 

Though e (dio ~ 7^ ^io]^ we can choose a sufficiently small 7 > 0 such 
that Ri ^ diQ because Ri is a constant independent of 7 > 0. Thus, without 
loss of generality, we can assume that Ri G (d^o — 7,dio). Then, (1.10.12) 
implies 



where /i(7) is a real number satisfying /i(7) 0 as 7 0. Combining this 

with (1.10.19), we obtain 



Therefore, hi turns out to be (r+7+/i(7))-achievable. This implies that ho~^S 
is (r-f-7 + //(7))-achievable because hi < ho-hS. Since 5 > 0 is arbitrary, any 
rate R satisfying R > ho is proved to be (r + 7 + /i(7))-achievable. 

Finally, by arbitrarily choosing a sequence {7;^} satisfying 71 > 72 > • • ■ > 
0 and 7/c — > 0 as A: 00 and repeating the argument above, using 7 = 71, 

7 = 72, ■ ■ • instead of 7 > 0, we can conclude that any rate R satisfying 
R > ho is r-achievable (diagonal line argument see the proof 1) of Theo- 
rem 1.8.2 in §1.8). 

2) Converse part 

Let K > 0 he a, sufficiently large constant specified below. Let 7 > 0 be 
an arbitrarily small constant again and set L = K/j. Partition the interval 
[0, dC) into the L subintervals 



cr*{dij + [di„ - cr*{dij - hi]+ <r + nij) 



Pr{X” e 5„} > 



By noting that Tn^Oasn— >00, (1.10.15) guarantees that 



limsup — log < r -h 7 + hil)- 

n — >00 i €yi 



/j = [(i - 1 ) 7 , 17 ) {i = l,2,---,L) 
each of which has width 7. Set 




= { 




and define 



1.10 Source Coding and Large Deviation: Probability of Correct Decoding 



91 






Clearly, 




L 

X” = Pi°) u IJ Pi*). 
2=1 


(1.10.20) 


Now, suppose that h is r- achievable. That is, 
code satisfying 


suppose that an (n, M„,e„)- 


, 1 , 1 
hm sup — log < r, 

n — >oo ^ 1 


(1.10.21) 


lim sup — log < h 

n—^oo 


(1.10.22) 


is given. Then, (1.10.22) implies 




M„<e**('*+") (Vn>no), 


(1.10.23) 



where r > 0 is an arbitrarily small constant. On the other hand, we obtain 

|rW| < gn(i7-<T*(i7)+r) 

{% — 1, • • • , L) in the same way developing (1.10.8) in the proof 1). We now 
define 

^0 = {X G 1 X = '0n(^n(x))} 

and evaluate Pr G Tn^ H Ao| in two different ways. First, in the same 
way as (1.10.7), we have 

Pr e pW } < (Vn > no), 

which leads to 

Pr {x” e pW n Ao} < (1.10.24) 

Secondly, by noticing that Px"(x) < e“”b-i)7 for x € Tn^ (i = 1, • • • , L), it 
follows from (1.10.23) that 

Pr { X” € Pi*) n .4o } = Yl 

x€T,(*>n4o 

< Y 

x€T,[''’nAo 



(1.10.25) 



92 



1 Source Coding 



The combination of (1.10.24) and (1.10.25) yields 

Pr e n ^o} < (1.10.26) 

for z = 1 , • • • , L. 

Next, we evaluate Pr G Pi Aoj. Since Px^(x) < e~'^^ for x G 
Tn^\ it follows that 

Pr { X” e n Ao } < ^ Px" (x) 

xGTi°^nAo 




xGTi°^nAo 



^_ri{K-h-r) 



(1.10.27) 



Therefore, the combination of (1.10.26) and (1.10.27) yields 
L 

l-£„ = ^Pr{x”eTWn^o} 

i=0 



L 

^-n{K-h-r) {i'y)-h\^ ^ 

i=l 



which leads to 

1 - £n < + £^g-n(po(M-r-7)^ 



where 

Po(h) = inf {a*(P) + [P - a*{R) -h]+}. 

tt^Kj 

If we choose a sufficiently large K satisfying K — h — r > po{h) — r — 
the first term on the right-hand side becomes negligible compared with the 
second term as n oo. Thus, it holds that 

limsup — log - — - — > Po{h) — T — 7 . 

n — >00 -L 

Then, (1.10.21) leads to 
r > po{h) - T - 7 . 

Since r > 0 and 7 > 0 are arbitrary, we obtain 
r > po(h) = inf {(t*(P) + [R- a*{R) - h]+} 

R^O 

by letting r ^ 0 and 7 ^ 0 . Therefore, we have established ho < h from 
the definition of ho in (1.10.10). That is, any r-achievable rate h cannot be 
smaller than ho- □ 



1.10 Source Coding and Large Deviation: Probability of Correct Decoding 



93 



Example 1.10.1. Let X be a stationary memory less source subject to a 
probability distribution Px over A', where we assume that H{Px) < 4-oo 
and A' is a countably infinite alphabet. Since cr*{R) = 0 for R > H{Px) 
and cr*{R) = inf I{x) for R < H{Px), Sanov’s theorem always holds for 

x<R 

R < H{Px) due to (1.9.72) in Example 1.9.6. That is, as in Example 1.9.1, 
denoting by Pr the projection to the plane ttr in (1.9.19), we have (t*{R) = 0 
for R > H{Px) and cr*{R) = D{Pr\\Px) for R < H{Px)> We note here that 
in (1.10.3) we have only to consider R satisfying R < H{Px) since we have 
H{X.) = H{Px) (see Remark 1.10.1). By noticing that R - cr*{R) = H{Pr) 
in (1.9.20) holds. Theorem 1.10.1 implies that 



R*(r|X) ^M{h>0 



M {D{Pr\\Px) + [H{Pr) - h]+} < r I 



(1.10.28) 



Since it is easy to verify 



mf {Z1 (Ph||Px) + [H{PR)-hr} = mm D{Pn\\Px), (1.10.29) 



R>0 

R*(r|X) is given by 

P*(r|X) = inf > 0 



min D{Pr\\Px) <r\ 
PR-.H(PR)<h ) 



min H(Q). 
Q-.D{Q\\Px)<r 



(1.10.30) 



Now, set R*(r) = R*(r|X) and define the inverse p*(R) of R*(r) by 
p*(R) =inf{r I Rl{r\X) < R] . (1.10.31) 

Then, from (1.10.30) p*(R) can be expressed as 



p*(R)= min D{Q\\Px) 

Q:H{Q)<R 



(Fig. 1.15). This is an extension of the formula developed by Csiszar and 
Longo [20] treating only sources with finite alphabets. Note that p*{R) = 0 
fovR>H{Px). □ 



Example 1.10.2. Let us consider the same stationary irreducible Markov 
source X treated in Example 1.9.2. Here, the source alphabet is assumed to 
be finite. If we define the plane ttr and the projections Pr and pr in the 
same way as Example 1.9.2, Sanov’s theorem implies that, instead of (1.9.25) 
and (1.9.26), cr*(R) = 0 for R > H{P\p) and 

a%R) = D{Pr\\P\pr), (1.10.32) 

R-(7%R) = H{Pr\pr) (1.10.33) 

for R < H{P\p). Then, Theorem 1.10.1 yields 



94 



1 Source Coding 



p\R) 




Fig. 1.15. 



K{r\X) 

== inf > 0 

== inf I > 0 



mf {D{Ph\\P\ph) + [H{Pr\pr) - ft]+} < r I 

min D{Pr\\P\pr) < r\ 

PR:H{PR\pR)<h J 



min 

QeVo:D{Q\\P\q)<r 



H{Q\q). 



(1.10.34) 



Then, (1.10.34) tells us that the inverse of p*{R) defined by (1.10.31) can be 
written as 



p*{R)= min D{Q\\P\q) 

QeVo:H{Q\q)<R 

(cf. Natarajan [72]). 



□ 



Example 1.10.3. Let us consider the unifilar finite-state source X given in 
Example 1.9.3 in §1.9 in order to further generalize Example 1.10.2. We as- 
sume here that the source alphabet and the state alphabet are finite sets. 
Under the same notations used in Example 1.9.3, we have the following for- 
mula of J?*(r|X) for unifilar finite-state source X from Theorem 1.10.1: 



i?:(r|X) 

= inf I /i > 0 



= inf > 0 



mf {D{Px^s^\\P\Psn) + mPxnS^\Ps^) - h] + } < r I 

in ^^D{Px,sJ\P\PsJ<r] 

RSR\Psj^)<h J 



mm 

PxrSj^-H{Px 



min 

PxseUo-D{Pxs\\P\Ps)<r 



H{Pxs\Ps)- 



(1.10.35) 



1.10 Source Coding and Large Deviation: Probability of Correct Decoding 



95 



In addition, (1.10.35) tells us that its inverse defined by (1.10.31) is 

given by 



p*{R) = 



min 

PxseUo:H{Pxs\Ps)<R 



D{Pxs\\P\Ps)- 



Example 1.10.4. Let us consider the mixed source X = given in 

Example 1.9.4. We assume that A' is a countably infinite source alphabet and 
both of H{Pi) < +00 and H{P 2 ) < +oo are satisfied. We define Pi and 1^2 in 
the same way as Example 1.9.4. We also define the following half-spaces: 



. ( 1 ) 
R 



. ( 2 ) 
R 



^QeP{X) 



^ Q{x) log 
^ Q{x) log 



1 

1 

P^) 




(1.10.36) 

(1.10.37) 



instead of and Denote by P’^^^ and the projection of Pi and 
P 2 to and respectively. Then, by using the same argument 

in Example 1.9.4, we obtain 



a*(R) = min(D(P*(')||Pi),D(P;(^)||P2)). 



(1.10.38) 



If we substitute this cr*{R) into the right-hand side of (1.10.3) in Theo- 
rem 1.10.1, we obtain the formula of i^*(r|X) for the mixed source. 

Here, it is easy to notice from (1.10.38) that cr*(i?) = 0 for > 
min{H (Pi) , H {P 2 )) and cr*(i?) is a monotone decreasing and continuous func- 
tion of R. Thus, we obtain 



Rl{r\X) < min(F(Pi), H{P 2 )) (Vr > 0). (1.10.39) 

In addition, since (1.10.38) tells us that cr*(P) > 0 for any R satisfying 
R < min(iJ(Pi), iJ(P 2 )), h > min{H (Pi) , H {P 2 )) must be satisfied in order 
to have 



(j*(P)4-[P-a-*(P)-/i]+ -0. 

That is, if 

h<mm{H{Pi),H{P2)), 

then 

mf {a*(P) + [R~ a*{R) - h]+} > 3j{h) > 0 

must be satisfied. This argument establishes 
limP*(r|X) = min(P(Pi),P(P 2 )). 

rlO 



(1.10.40) 



96 



1 Source Coding 



Remark 1.10.2. Unfortunately, a simple formula on R*(r|X) correspond- 
ing to (1.9.44) in Remark 1.9.3 giving Re{r\X.) for mixed sources X does not 
hold. □ 



Example 1.10.5. Let us consider the case that a source X == {X'^ 
(Xi, X 2 , • • ’ , Xn)}^i is the autoregressive process given in Example 1.9.9 
in §1.9. By using the same argument as Example 1.9.9 based on Sanov’s 
theorem, we obtain the following formula: 



Km 



min 

Q-.D(Q\\Pw)<r 



H{Q), 



where, in this case, the alphabet W can be a countably infinite set of real 
numbers. □ 



Example 1.10.6. Consider the source Z subject to the geometric distribu- 
tion (1.9.76) in Example 1.9.10. This source is nonstationary and nonergodic. 
Since we have 



a 



* 




-hoo for R < a, 
0 ioi R> a 



from simple computation, we obtain 



Km = inf 



h>0 



M{a*{R) + [R-a*{R)- 




J a — r for r < a, 
[ 0 for r > a. 



1.11 Reliability Functions of the General Source with 
Variable-Length Coding 

So far, on the basis of the technique of information- spectrum slicing^ we have 
established the unifying formulas for the infima Re(r|X) and R*(r[X) of 
achievable fixed-length coding rates with the general source X given the er- 
ror probability exponent r > 0 and the correct probability exponent r > 0, 
respectively. We have also shown many examples to reveal not only that all 
the previous known results on this kind of infimum achievable rate problems 
immediately follow in an indeed unifying way from these general formulas 
but also that pleasingly novel (or potentially more novel) results and useful 
insights concerning general sources, particularly, with countably infinite al- 
phabet can be newly provided by our approach. It thus has been shown that, 
from the viewpoint of its potentiality, the information- spectrum method can 
go beyond the method of types. This is one of the significant advantages of 
the information-spectrum method. 



1.12 Information Spectrum and Invariancy 



97 



One may wonder, however, why we have not formulated these problems 
in the inverse form, i.e., in the form of the reliability functions as usual. 
The reason is that we should not have been successful in effectively invoking 
the key technique of information-spectrum slicing if we had formulated the 
problems in the usual reliability function forms. Full generality is sometimes 
spawned only from sound formulations. This is an interesting observation. 

Finally, some comments on the relation to variable- length source coding 
problems follow. In the context of prefix variable-length codings with the 
general source X, we can consider also the problem to establish the unifying 
formula for the infima Le(r|X) and L*(r|X) of (normalized) achievable code- 
word length thresholds given the overflow probability exponent r > 0 and 
the underflow probability exponent r > 0, respectively. Surprisingly enough, 
it turns out that 

Le{r\X) = Re{r\X) (Vr > 0), (1.11.1) 

L:(r|X)=i?:(r|X) (Vr > 0) (1.11.2) 

always hold for any general source X with countably infinite source alphabet. 
Reader who are interested in the details may refer to Uchida and Han [87]. 



1.12 Information Spectrum and Invariancy 

So far we have described various kinds of coding based on the notion of 
the information-spectrum. We conclude this chapter with referring to basic 
properties of the information-spectrum. 

We begin with a simple example. Set X = {a, 6, c} and consider the source 
with probability distribution 

Px(a) = l, Px{b) = \, Px{c) = \. 

We denote this source by 

( 1 . 12 . 1 ) 

V 3 2 6 / 

Next, we transform the source X into another source by permuting source 
symbols a, b and c as follows: 

(1.12.2) 

(we call this operation the renaming of the source symbols). Then, the fol- 
lowing question arises: from the viewpoint of information theory, should we 
identify X with X or treat them as two different sources? We notice that 
the entropy H{X) coincides with H{X). Since it is a basic standpoint in in- 
formation theory that source symbols should be distinguishable and do not 



98 



1 Source Coding 



contain any particular meanings, we can consider that we need to identify X 
with X in information theory. On the other hand, there is a situation that 
we must consider where each symbol contains an intrinsic meaning. In this 
situation we treat X and X as two different sources. 

Let us consider another example. Suppose that X = is the 

stationary memoryless source with the probability distribution Px(0) = ^ 
and Px(l) = I over the source alphabet X — {0, 1}. If we consider the case 
of n == 4 as an example, the probability distribution of the binary sequences 
is written as 



binary sequence x 


probability 


0000 


( 1 )^ 


0001 


( 1)^1 


0010 


(i)^i 


0011 




0100 


im 


0101 




0110 


mir 


0111 


3\S) 


1000 


m 


1001 


mir 


1010 




1011 


uir 


1100 




1101 


1(2)3 
3 V3/ 


1110 


i (|)^ 


1111 


(i)^- 



We can find the remarkable property that each probability only depends 
on the type (the empirical distribution) of x G {0,1}^. It is this property 
that gives a basis of the theories on discrete information systems (called the 
ty'pe theory) integrated by Csiszar and Korner [19] in their famous book ti- 
tled Information Theory: Coding Theorems for Discrete Memoryless Systems 
(1981). 

Now, we rename the alphabet {0, 1}^ as on the following page under the 
same probability distribution. Denote by X = {X the source obtained 

in this way. We notice that the probabilities of the binary sequences no longer 
are determined by their types and X becomes a source without stationarity 
and ergodicity. That is, the structure as a stationary memoryless process 



1.12 Information Spectrum and Invariancy 



99 



binary sequence x 


probability 


1010 


iir 


1101 


(in 


0100 


(Iff 


0110 




1011 


(1)^1 


0000 


amr 


0001 


mi)" 


1000 


1(2\3 
3 V3/ 


nil 


(1)^1 


1001 


(ifdf 


0111 




0010 


Kr 


1110 


mi)" 


1100 


1(2)3 
3 V3/ 


0101 


idf 


0011 


d)^ 



does not remain invariant after the renaming of elements of for each 
block length n. The notion of types no longer makes sense. If we call the 
study investigating properties and quantities that remain invariant after the 
renaming “information theory,” the most fundamental stationary memoryless 
sources are beyond the scope of information theory! The situation is the same 
if we consider Markov sources and stationary ergodic sources that are familiar 
to us in information theory. Then, what should we study in information 
theory? 

In fact, we have a property that remains invariant under the renaming. 
The property is nothing but the information-spectrum. In the example above 
treating a binary source, the information-spectrum of X coincides with the 
information-spectrum of X. As readers may have noticed, every theorem in 
this chapter excluding theorems in §1.1 and §1.2 just describes one of proper- 
ties of such an invariant information-spectrum. All the theorems given in the 
following chapters have their basis only in the invariancy of the information- 
spectrum excluding theorems that require additional assumptions such as 
being stationary or memoryless (Here, in the channel coding treated in Chap- 
ter 3, we consider the renaming of the symbols in an input alphabet and an 
output alphabet separately, i.e., we do not permit the renaming of an input 
symbol as an output symbol. Otherwise, the notion of the channel does not 



100 1 Source Coding 



make sense.) We call the theory that is based on the information-spectrum 
and requires no assumption the invariant theory. The main objective of this 
book is investigation of such an invariant theory. 

So far 

x= ^ (1.12.3) 

has been denoting a general source with a source alphabet X. From the 
viewpoint of the invariant theory, however, this notation is not adequate. 
Here, we have already assumed the “structure” that each element from 
the source X takes values in the n-th Cartesian product of the source 
alphabet X. In more strict sense, a general source X should be formulated 
as a sequence of arbitrary random variables 

Z = (1.12.4) 

on a given sequence of source alphabets 

= (1.12.5) 

where takes values in Zn for each n. Here, each alphabet Zi, 2^2, • ' ■ can 
be either a finite set or a countably infinite set. Furthermore, the alphabets 
can be finite alphabets and countably infinite alphabets by turns. If we treat 
this general source Z together with the notions of the coding rate and the 
error probability, we have the source coding problem observed in this chapter. 
The general source in (1.12.3) is obtained by choosing Zn = X^ and setting 
Zn = X'^. We have chosen the notation in (1.12.3) because the source coding 
and the channel coding are conventionally performed blockwise. However, 
note that we have actually assumed nothing, though we used such notation. 
Readers can check that Theorem 1.9.1 and Theorem 1.10.1 still hold even 
if Zn and Zn are used instead of X'^ and respectively. In addition, all 
of the theorems in this chapter and the following chapters - excluding theo- 
rems imposed on additional assumptions - hold for the general source given 
in (1.12.4). Consequently, they consist of the most general form or the core 
of information. If we view various theorems in information theory from the 
standpoint of the invariant theory, we can judge whether the theorems hold 
for sources with some structure or they invariantly hold without any assump- 
tion on the sources. This viewpoint provides an extremely important view- 
point to studies of information theory. For example, while Theorem 1.9.1 in 
§1.9 invariantly holds for all sources. Example 1.9.1 only holds for stationary 
memoryless sources. More correctly, we should say that Example 1.9.1 holds 
for the class of the sources obtained from the renaming of stationary memo- 
ryless sources. In this sense we can consider that Example 1.9.1 invariantly 
holds for such a class of sources. 

Though we have mentioned that the structure as a stochastic process such 
as being stationary or memoryless is not invariant under the renaming, this 
happens when we rename elements in X'^ for each block length n. However, 



1.12 Information Spectrum and Invariancy 



101 



from the standpoint of permitting the renaming only elements in A', all of 
the structures as a stochastic process such as stationarity and ergodicity 
become invariant. Furthermore, since many practical sources can be regarded 
as stochastic processes with some characteristic structures, it is quite natural 
that studies of stochastic processes are - without regarding the invariancy - 
in an important position in information theory. 

Summarizing, it is necessary for us to strictly distinguish the facts that 
invariantly hold under all kinds of the renaming from the facts that hold under 
some renaming keeping a particular structure. We consider that studies of 
information theory progress as we develop connections between the two kinds 
of facts and integrate them after we distinguish them. 

We conclude this chapter with the following remark. So far we denote the 
entropy density rate by 



llog . 



(1.12.6) 



If we drop the assumption that X'^ takes values in the n-th Cartesian product 
of a source alphabet A', we can generally write the entropy density rate 
as 



n Pz^Zn) 



(1.12.7) 



by using the notation in (1.12.4). However, n in (1.12.7) no longer means the 
block length since takes values in a general source alphabet Z^. Hence, 
we need to consider whether we need to divide 

^°^Pz,XZn) 

in (1.12.7) by “n” or not. Notice that (1.12.7) does not mean the entropy 
density rate per source symbol any more, though we have such a meaning in 
(1.12.6). In this general case, however, we can consider n = 1, 2, • • • as, for 
example, discrete time points. Thus, we can replace (1.12.7) with 



log- 



(1.12.8) 



/(n) "PzA^n) 

in general, where /(n) > 0 is an arbitrary monotone increasing function 
satisfying /(n) — ^ oo as n ^ oo. Then, what kind of operational meaning 
does this entropy density rate have? In order to understand the operational 
meaning, let us consider the problem of the “probability of decoding error” 
described in §1.9 as an example. First, we define 



Definition 1.12.1. 

Rate R is r-achievable <4^ There exists an (n, M^, ^n)-code satisfying 

lim inf ^ log — > r and 
n^oo f[n) Sn 



lim sup 

n— >(X) 



1 

f{n) 



log Mn < R- 



102 1 Source Coding 



Definition 1.12.2 (Infimum r-achievable fixed- length coding rate). 



Re{r\'K) = inf {R \ R is r-achievable} . 



instead of Definition 1.9.1 and Definition 1.9.2, respectively. 

These definitions formulate the problem of how we can make the coding 
rate 

7^1ogM„ (1.12.9) 

f(n) 

small subject to the constraint that the error probability Sn satisfies 

£„ ~ ( 1 . 12 . 10 ) 



instead of (1.9.1). Here, the coding rate defined in (1.12.9) means that we 
are interested in the exponent R when the size of the code is expressed 
in the form of 

( 1 . 12 . 11 ) 



We can consider the meaning of r in (1.12.10) similarly. For example, if we 
set /(n) = logn, (1.12.9)-(1.12.11) become 



£n -n 



log Mn 
logn 



M.. 



: n 



R 



respectively. In this case, we are interested in the rate that the error prob- 
ability Sn converges to 0 as a power function of n (not as an exponential 
function of n). If we replace (j{R) in (1.9.2) by 



a(R) = liminf -77— r 
n -^00 /(n) 



log- 



Pr 



{75 



n) Px^(X«) 



> P| 



( 1 . 12 . 12 ) 



it is easy to verify that Theorem 1.9.1 holds in the same form under these 
definitions. From this point of view. Theorem 1.9.1 treats a special case of 
/(n) = n. Actually, we have chosen /(n) = n only because setting /(n) = n is 
adequate for characterizing a certain aspect of many typical sources. In fact, 
the most adequate “scaling factor” /(n) is naturally determined by respective 
aspects that we choose for characterizing information. From this standpoint 
we do not need to divide the entropy density rate by the block length n 
even if source outputs take values in the n-th Cartesian product of a source 
alphabet. 

As is seen above, considering a general source Z = in (1.12.4) 

leads to an advantage that we can clarify a form of information that is in- 
variant with respect to the choice of the scaling factor /(n) in the generalized 
entropy density rate (1.12.8). We now have another kind of invar iancy dif- 
ferent from the invariancy on the renaming described in this section as well. 
Note that the remarks given in this section are applicable, though we use 
notations similar to (1.12.6) in the following sections. 



2 Random Number Generation 



2.1 Random Number Generation 

Random number generation is the transformation of a random variable X into 
another random variable Y with a specified probability distribution. One of 
the random random number generation problems that is most familiar to us 
is the one using coin flipping. 

Example 2.1.1. We would like to generate the random variable 

f a b c 
^ ~ [ I I I 

V 2 4 4 

satisfying Py(a) = |,Py(6) = j and Py(c) = j from flips of the unbiased 
coin 

head tail \ 

2 2 / 

with Px(head) = ^ and Px(tail) = To this end, we can use the generation 
tree given in Fig. 2.1, where 1 and 0 mean “head” and “tail,” respectively. 





In this case, the worst number of flips is equal to 2 while the average 
number of flips is equal to 1.5. If we leave the average number of flips out of 
consideration, the variable-length generation tree in Fig. 2.1 is equivalent to 
the fixed- length tree given in Fig. 2.2. □ 



104 2 Random Number Generation 




Fig. 2.2. 



Example 2.1.2 (Cover and Thomas [17]). We would like to generate the 
random variable 




from flips of the unbiased coin X. In this case, we can use the generation 
tree given in Fig. 2.3 obtained from the following binary expansion of the 
probabilities | and 

- = 0 . 101010 ---, 

3 

- = 0.010101 •••■ 

3 

Though the worst number of flips is infinity because the size of the generation 
tree in Fig. 2.3 is infinite, the average number of flips is equal to 2. □ 



Example 2.1.3 (Knuth and Yao [58]). 

Suppose that Y is a general random variable taking finite values. Then, 
there exists a generation tree that generates Y from flips of the unbiased coin 
X satisfying 

H2{Y) < E(L) < H2{Y) + 2, (2.1.1) 

where E(L) denotes the average number of the flips. In addition, the lower 
bound on the leftmost side is valid for all generation trees. □ 

Example 2.1.4 (Interval algorithm: Han and Hoshi [42]). Example 2.1.3 
can be generalized in the following way. Suppose that X and Y are random 
variables subject to general probability distributions p = (pi,P2, • • • ?Pm) and 
q = (^i 5 ^2, • • • 5 respectively. We consider the random number generation 
problem in which we generate Y by using the (generalized) coin X. Then, 
there exists a generation tree with the average number of flips E(L) satisfying 



2.1 Random Number Generation 



105 




Fig. 2.3. 



H{Y) log(2(M - 1)) 

H{X) - ^ H{X) ^ H{X) ^ {l-Vm^..)H{Xy 
where Pmax == niax pj and h{-) denotes the binary entropy. In 



(2.1.2) 

addition, 



the lower bound on the leftmost side is valid for all generation trees. If we 
consider the special case that X is the unbiased coin, i.e., the case of M = 2 

and Pi = 2 ’ h.8cve 

H2{Y)<E{L)<H2{Y)^3. 



The upper bound above is greater than the upper bound in (2.1.1) by one. 
This is due to the simplicity of the random number generation using the in- 
terval algorithm. □ 



Example 2.1.5. Let U and T be the continuous random variables uniformly 
distributed over the interval [0, 1] and subject to an arbitrary probability dis- 
tribution, respectively. We consider the random number generation problem 
that we transform U into Y. Theoretically, this problem is quite easy. Denot- 
ing by Py the probability density function and P(-) the distribution function 
of Y defined by 

F(y) = f PY{y)dy, 

J — oo 

the random variable Y that we would like to obtain is generated as T = 
F-\U) (Fig. 2.4). 



106 2 Random Number Generation 




Fig. 2.4. 



□ 

In this chapter we focus on the random number generation when both 
X and Y are discrete, where X is the random variable used for the random 
number generation (called the coin random number) and Y the random vari- 
ables to be generated (called the target random number). In this case, for 
arbitrarily given X and Y there is generally no function Y = (p{X) that yields 
Y exactly satisfying Py = Py by applying (p to X for just one time. Thus, we 
consider the generation of the random number Y = P^(X) with a probability 
distribution Py that does not coincide with Py, the probability distribu- 
tion of Y, but sufficiently approximates Py under a certain measure. Such a 
problem is also called the probability distribution approximation problem. 

We use the following variational distance as a measure of the probability 
distribution approximation. 

Definition 2.1.1 (Variational distance). Let Pz and P^ be probability 
distributions over a countably infinite set Z. The variational distance between 
Pz and Py; is defined by 

where d{Pz,Pz) is also denoted by d{Z^Z). 

Remark 2.1.1. One of the well-known facts on the variational distance is 

d{Pz,Pz)=2 sup \Pz{A)-Py{A)\. 

A:ACZ 

If Z is a general set that is not restricted to a countably infinite set, this 
gives the definition of the variational distance. 

Hereafter, we consider the case that both the coin random number and 
the target random number are general sources. The information-spectrum 
approach is also quite useful for random number generation problems for 
general sources. First of all, we give two fundamental lemmas for treating the 
random number generation problems. Notice that, unless stated otherwise, 
source alphabets X and y are countably infinite sets throughout this chapter. 



2.1 Random Number Generation 107 



Lemma 2.1.1. Let X = and Y — {y^}^=\ he arbitrary general 

sources, where X'^ and Y'^ are random variables taking values in and y'^, 
respectively. Then, for all n = 1,2,... and arbitrary constants 7 > 0 and a, 
there exists a mapping (pn : X'^ — > y^ satisfying 

d{Y^,Mxn) 

< 2e-”^ + 2 max (Pr {X" ^ S„(a + 7 )} , Pr {X” ^ T„{a)}) , (2.1.3) 

where we use the notations 

S„(a) = {x£.T"|i|og-^^>a}, 

throughout this chapter. 

Proof. We repeat the following operation so as to define the mapping (p^ : 
X^ — » 3^^. First of all, let 

Tn{y) = {yi,y2,--*,yM,J = |Tn(a)|) 

be a list of all elements in T^(a). We can see that T^(a) is a finite set in the 
following manner. Since Py^ iy) > for any y G T^(a), we have 

1 > E Py-(y)> e-”“ = e-”“|T„(a)|, 

yeTr,{a) yGT„.(a) 

which implies 

|T„(a)| < e”“. (2.1.4) 

First, we arbitrarily choose a subset A(l) of Sn {a Y "f) for yi satisfying 

Px-{x) < Py^' iyi) 

x6A(1) 

and 

■Py" (y 1 ) < Yl 

xGA(1) 

for any x' G 5n(a 4- 7 ) — ^(1). Next, we arbitrarily choose a subset A{2) C 
Sn{a 4- 7 ) - A{1) for y 2 satisfying 

^x»(x) < Py-(y2) 

xEA(2) 

and 

Py’> (y2) < ^ Px- (x) + Px" (x') 

xGA(2) 



108 2 Random Number Generation 



for any x' G 5'n(a + 7 ) — ^(1) U ^(2). Furthermore, we choose a subset 
A{3) C 5n(a -h 7 ) — ^(1) U A{2) for ys in the same way. We repeat this opera- 
tion in the order of yi, y 2 , • • * as long as possible. Suppose that this operation 
stops at y-iQ. We need to consider the following two cases; 



1) Case of io = M^: 

Define a mapping \ y'^ by 

{ Yi forx6^(i) (i = 1, ■ • • ,to - 1), 

for X € - Uri^4(0 

and set Y'^ = Then, we first have 

d{Y^,Yn= E \PyAy) - PyAy)\ 

yey^- 

io — l 

= E \PY"(yi) - -Py"(yi)l + IPy^iyio) - -Py-Cyio)! 

i=l 

+ E I^>^"(y)-^y4y)l- 

y^Tr,.{a) 



By noticing that 



l-Py-(y»o) - •Py"(yio)l 

E Py-iy)- E Py-'iy) 

y¥^yio yi^yiQ 



"io — 1 

< E \Py"'^yi) - -py"(yi)l + E \Py^^y) ~ -py"(y)l> 

i=l y0T„(a) 

it follows that 



io — l 

d(Y^,Y^) < 2 E \Py<yi) - PMyi)\ 

i=l 

+ 2 E l-P>^"(y) -■fy"(y)l- 

y(T„{a) 

Here, note that for z = 1, • • • , io ~ 1 we have 



(2.1.5) 



(2.1.6) 



l-Py" (yi) - -Py« (yi) I < -Px" (x) for some x e 5„(a + 7 ) 

from the construction of A{i). Since Px” (x) for any X G S'n(u+7), 

it holds that 



liV"(yi)-Py4yi)l 

Accordingly, we have 



2.1 Random Number Generation 109 



2 \PrAyi) - PYr^{yi)\ < (2.1.7) 

2=1 

The combination of (2.1.7) with (2.1.4) yields 

\PyAyi) - P^yi)\ < (2-i-8) 

2=1 

On the other hand, since Py„.(y) == 0 for y ^ Tn{a), we have 

^ |Py.(y) - Py„,(y)| = Pr{r" i T„(a)} . (2.1.9) 

y^T„(a) 

By substituting (2.1.8) and (2.1.9) into (2.1.6), we obtain 

d(y”, f ”) < 2e“"^ + 2Pr {P” ^ T„(a)} . (2.1.10) 

2) Case ofio < M„; 

Define a naapping (/?„ : P" — ^ (V” by 

{ Yi forxeyl(i) (i = 1, • • • ,io), 

. 

yio+i for X € P” - UIli Mi) 

and set P" = (pn{X^). Defining = {yi, • • • ,yjo+i}, it follows that 

d(Y^,Y^= \PyAy) - Py4y)\ 

yey- 

= IPy-iYi) - Py..(yi)| + |Pyn(yio+i) - Pyn(yio+i)| 

2=1 

+ Yl l-fV"(y) -^y”(y)l- 

yin. 

Similarly to the manner developing (2.1.5), we have 

|Py’‘(yio+i) -Pyn(yio+i)l 

^0 

< YY \PY”(yi) - Pyn(yi)l + Y. \Py^'(y) ~ ^y-(y)l- 

^=1 y^T'. 

Thus, we obtain 

^0 

d(P”,p") < 2Y\PY<yi) - Py..(yi)l 
2=1 

+ 2 ^ |Py„(y)-P^„(y)|. 

On the other hand, we obtain 



( 2 . 1 . 11 ) 



110 2 Random Number Generation 



E IPyAVi) - PyM\ < (2.1.12) 

i=l 

similarly to (2.1.8) in Case 1). Next, the definition of (fn leads to 

io + l 

^ Py- (yi) > Pr {X” € 5„(a + 7)} . 

i=l 

By taking the fact that Py. (y) 0 for y ^ into consideration, it holds 

that 

^ |Py-(y) -Pyn(y)| = PyAy) < Pr{^” ^ ^n(a + 7)}- (2.1.13) 

By substituting (2.1.12) and (2.1.13) into (2.1.11), we obtain 

d(y", y”) < 2e-”T' + 2Pr {X” ^ -S„(a + 7)} . (2.1.14) 

Now we can conclude that there exists a mapping y" = (p„(X") satisfying 
d(y”, y") < 2e-"^ + 2 max (Pr {X” ^ 5„(a + 7)} , Pr {y" ^ T„(a)}) 

(2.1.15) 

by summarizing (2.1.10) in Case 1) and (2.1.14) in Case 2). □ 



Now, let 
H(X) = 



p- liminfllog 

n-^oo n 



denote the spectral inf-entropy rate of X (see §1.5 in Chapter 1) and 

7T(Y) 






the spectral sup-entropy rate of Y (see §1.3 in Chapter 1). Then, Lemma 2.1.1 
immediately yields the following result. 



Theorem 2.1.1 (Nagaoka [68]). Let X_= {X^}n=i Y = be 

arbitrary general sources. If > H{Y), then there exists a mapping 

satisfying lim d{Y'^ , ipn{X'^)) = 0. 



Proof Since II (X.) > H{Y), we can find an a satisfying 
H{X) > a + 7 > a > H{Y) 

by choosing a sufficiently small 7 > 0. It is obvious that 
lim Pr{X^^5n(a + 7)} = 0, 

n^oo 

lim Pr{y"^T„(a)} = 0. 

n-^oo 

Therefore, the claim of the theorem follows from Lemma 2.1.1. □ 



We need the following lemma for obtaining properties on the converse 
theorem of random number generation. 



2.1 Random Number Generation 111 



Lemma 2 . 1 . 2 . Let X = Y = be arbitrary general 

sources. We use the same notation as in Lemma 2.1.1. Then, for all n = 
1 , 2, • • •; constants 7 > 0 and a and for any mapping 'It holds 

that 

> 2 Pr {F” i Tn{a + 7 )} - 2 Pr {X” € 5„(a)} - 2 e-"'^. (2.1.16) 

Remark 2.1.2. The right-hand side of (2.1.16) can also be written as 

2 Pr {X" i 5„(a)} - 2Pr {F" e T„(a + 7 )} - 26 -”"^. (2.1.17) 

Proof of Lemma 2.1.2. 

Define 

To = {y € T^lia + 7 ) | ^^'(y) n Sf,{a) ^ 0} . (2.1.18) 

Since Pr»(y) < for y E T^(a -t- 7 ), it follows that 

Pr{F" € To} = ^ Py-(y) < ^ 

yGTo yGTo 

< (2.1.19) 

However, |Tol cannot be greater than |AS'^(a)| from the condition Tn^(y) 
S^(a) 7 ^ 0 on the right-hand side of (2.1.18). That is, |To| < \S^{a)\. In 
addition, by noticing (x) > for x G S^{a), we have |*S'^(a)| < 
which implies that |To| < Consequently, it holds from (2.1.19) that 

Pr {yn gTo} 

Thus, if we set T\ = T^(a + 7 ) — To, it follows that 

Pr {Y^ G Ti} = Pr {Y^ ^ T^[a -b 7 )} - Pr G To} 

> Pr {Y^ i Tn{a -f 7 )} - e~^^ . (2.1.20) 

We notice here that, if y G Ti, from the definition of To we have 
y € T^(a + 7 ) and <pp{y) n S^{a) = 0 . 

Since Tn^iv) ^ setting Y'^ = (pn{X'^) yields 

Pr |f” G Ti} < Pr{X” G 5„(a)}. ( 2 . 1 . 21 ) 

Hence, we obtain 

d(F",F”) > 2 (Pr{F"€Ti}-Pr{F”GTi}) 

> 2 (Pr {F" G Ti) - Pr {X” G 5„(a)}) 

> 2 (Pr {F” i Tn{a + 7 )} - g-”'' - Pr {X^ E 5„(a)» 

from the combination of Remark 2.1.1, (2.1.20) and (2.1.21). □ 



Lemma 2.1.2 immediately yields the following result. 



112 2 Random Number Generation 



Theorem 2.1.2 (Nagaoka [68]). Let X = Y = he 

arbitrary general sources. If there exists a mapping cpn '• satisfying 

lim d{Y'^ ^ (pn{X'^)) = 0 ; then it holds that 

n^oo 

1) H{X)>H{Y), 

2) ^(X)>^(Y). 

Proof. 

Proof of 1): 

Since lim Pr{X^ G S'n(a)} = 0 for any a satisfying a > H(X.), Lemma 

n^oo 

2 . 1.2 implies that lim Pr {Y^ ^ 7h(a -f 7 )} = 0 for an arbitrary 7 > 0 pro- 

n— >oo 

vided that 

lim d(y",^„(X”))- 0 . 

n— »oo 

Therefore, H(Y) < a + 7 . Since a and 7 > 0 are arbitrary, we have 
H{X) >H{Y). 

Proof of 2): 

If we arbitrarily choose h with h < HiY)^ we have lim Pr {Y^ G Tnih)} = 

n — >00 

0. Therefore, if lim d(Y^, = 0? Lemma 2.1.2 with a = b — ^ and 

n — >00 

Remark 2 . 1.2 guarantee that lim Pr ^ Sn{b — 7 )} = 0 for an arbitrary 

n — >00 

7 > 0. That is, we have 6 — 7 < H(X). Since b and 7 > 0 are arbitrary, we 
haveS(X) >H{Y). □ 

We can obtain the following result from this theorem. 

Corollary 2.1.1 (Nagaoka [68]). If 
lim d{Y^,Y^) = 0 

n— ^•(X) 

for two general sources Y — and Y = 'It holds that 

1) H{Y) = H{Y), 

2) H{Y) = H{Y) 

(Fig. 2.5). 

Proof. If we use Y instead of X and choose (pn as the identical mapping, 1 ) 
in Theorem 2.1.2 implies H{Y) > II(Y). Next, if we change the roles of Y 
and Y, we have H{Y) > H(Y). Consequently, H(Y) = H{Y) is established. 
Claim 2 ) can be proved in the same way. □ 

The following result also follows from Theorem 2.1.2. 



2.1 Random Number Generation 113 



information spectrum of Y 




information spectrum of Y 



H{Y) = H{Y) 



H{Y) = H(Y) 



Fig. 2.5. 



Corollary 2.1.2 (Nagaoka [68]). Letcpn ‘ ^ be an arbitrary mapping 

and set Y'^ — (pn{X^) and Y = for a general source X = 

Then, it holds that 



1) H{X)>H(Y), 

2) H{X) > H{Y). 

Proof. We have only to choose Y = as Y = in Theo- 



Remark 2.1.3 (Non-increasing property of the spectral sup/inf- 
entropy rate in the random number generation). Corollary 2.1.2 means 
that, if we transform a random number into another random number, the 
spectral sup-entropy rate and the spectral inf-entropy rate of the obtained 



Remark 2.1.4. While Theorem 2.1.1 gives a sufficient condition that there 
exists a mapping ipn satisfying lim d(Y^, (pn{X'^)) = 0, Theorem 2.1.2 gives 



a necessary condition on the existence of (pn- However, there is a gap be- 
tween the necessary condition and the sufficient condition. “A necessary and 
sufficient condition” without such a gap is given by Nagaoka and Miyake 
[70]. Their result can be summarized as follows (suppose that T is a finite 
alphabet): 

1) Sufficient condition: If there exists a joint probability distribution 
whose marginal distributions are Px^ and Py^, satisfying 



rem 2.1.2. 



□ 



random number do not increase. 



□ 




( 2 . 1 . 22 ) 



114 2 Random Number Generation 



with respect to this then there exists a mapping (pn : 

satisfying lim d{Y'^ = 0. 

n— >oo 

2) Necessary condition: If there exists a mapping satisfying 

lim d{Y'^ , (pn{X'^)) = 0, then there exists a joint probability distribution 

n— >oo 

, whose marginal distributions are and Pyn. , satisfying 

p- lim inf ( - log — — - log — — ) >0 (2.1.23) 

^ n^oo \n ^Px-(X^) n ^PYr^(Y^)J~ ^ ^ 

with respect to this Px^' Y^' • 

This “necessary and sufficient condition” can be roughly interpreted as 
follows. For an arbitrary real number u the cumulative information-spectrum 
of X = {X^]^^i from u to +oo is greater than that of Y = (see 

the argument after Remark 3.8.3 in §3.8 of Chapter 3). Theorem 2.1.1 can 
be obtained as a consequence of the sufficient condition above since (2.1.22) 
holds with respect to Px^*y^^(x, y) = Px^^ (x)Pyn (y) under the condition of 
Theorem 2.1.1. On the other hand, it is easy to verify that, if the necessary 
condition (2.1.23) holds, then conditions 1) and 2) in Theorem 2.1.2 follow. □ 



Corollary 2.1.1 claims that the spectral sup-entropy and inf-entropy 
rates of Y = coincide with the spectral sup-entropy and inf- 

entropy rates of Y = respectively, if the variational distance 

satisfies lim d{Y'^,Y'^) = 0, However, if lim d(Y^, Y^) == 0, we can actu- 

n— >-cxD n— >oo 

ally show that not only the spectral sup-entropy and inf-entropy rates but 
also the information-spectra (the distributions of entropy density rates) of 
Y = {Y'^}’^^i and Y = themselves asymptotically coincide as 

n — > oo. The following theorem describes such invariancy of the information- 
spectra. We use the following Levy distance instead of the variational distance 
as a measure between the two distributions. 

Definition 2.1.2 (Levy distance). Let U and V be real-valued random 
variables. The infimum of /j, > 0 satisfying 

Fr {U < X — fi} — /a < Fv {V < x} <Fv{U < x -\- fi} fi (2.1.24) 

for allx eH (the set of all real numbers) is called the Levy distance between U 
and V and is denoted by L{U^ V). The Levy distance L{Lf, V) is also denoted 
by P(Pc 7 , Py), where Pjj and Py denote the probability distributions ofU and 
V, respectively. 

We have the following theorem under this definition. 

Theorem 2.1.3. Let Y = and Y = {Y^}^i be two general 

sources. If lim d(Y’^,Y^) = 0, then we have 



2.1 Random Number Generation 



115 



with respect to the information- spectra ofY and Y. 



(2.1.25) 



Remark 2.1.5. For two arbitrary sequences of real- valued random variables 
(two general sources) Zi = and Z 2 — if lim L{Z^\ 

= 0, then it is easily verified that 

p- lim sup Z^^ = p- lim sup Z ^^ , 

n-^00 n— ^00 

p- lim inf Z^^ = p- lim inf Z^"^ 

n-^00 n-^00 

always hold. Hence, it is clear that Corollary 2.1.1 can be obtained as a spe- 
cial case of Theorem 2.1.3. □ 



Proof of Theorem 2.1.3. 

If we set 5n = dCV^^Y'^)^ we have lim Sn = 0 from the assumption of 

n—^00 

the theorem. First, since the variational distance can be evaluated as 



d{Y\Y^)= \PvAy) - Py4v)\ 

yey^'- 



> E \PYAy)-PYr’(y)\ 

y:Pyr,,(y)>0 



= E Py^‘(y) 1 

y;Pyr,. (y)>0 



Py4y) 

Py„(y) 



it follows that 

E PY”{y) 1 

y:Pyr,.(y)>0 



-Py^.(y) 

Py„(y) 



< 5n- 



Then, setting 



Py„(T" 

Py..(y") 



Pn = 



y € I Pyn (y) > 0 and 



Py4y) 

Py„(y) 



< 




(2.1.26) 



and applying Markov’s inequality in Remark 1.1.1 in Chapter 1, we obtain 
Pr {T" € T„} > 1 - vC. (2.1.27) 

We note here that it holds from (2.1.26) that 



116 2 Random Number Generation 



1 - \/^ < < 1 + \/^ 
Py-(y) 


(2.1.28) 


for y G Tn- Then, (2.1.27) and (2.1.28) yield 




Pr{y"€T„}>(l-V^)2. 


(2.1.29) 


If we set 




s„W = {y£y|„iogp^_(y)<x}, 


(2.1.30) 


S»(x) = {yeri„logp,_(^j<».} 


(2.1.31) 



for an arbitrary real number x, (2.1.27) and (2.1.28) lead to 
Pr{F” G 5„(x)} 

= Py’'(y) 

yeSr,{x) 



= Pyn(y)+ Y PY«(y) 

yeSr,{x)nTr, yeSr,(x)nT^ 



< Y Py^'(y) + ^ 

yeS„(x)nT„ 



< 




Pyri{y) + 

yeSr,{x)nT,, 



(2.1.32) 



On the other hand, if y G Sn{x) Pi T^, (2.1.28) implies that 

1 1 1 / 1 1 1 1 1 1 
■ ~F ^ — ^ ~F ^ — TT + - log ■ 



n Pyr,Xy) n Py-(y) n 1 - 

1 . 1 
< X H — log 



n ^ 1 - v^’ 

Therefore, we obtain 
5„(x)nT„c5„(4”^), 
where 

x$^^ = X H — log 

n 1 — 



Now, it follows from (2.1.32) and (2.1.33) that 
Pr {y" e Sn{x)} < - — Y -Pp"(y) + 



= ^ Pr {y” e + v^, 



(2.1.33) 



2.1 Random Number Generation 117 



which implies 

Pr {y" € < Pr {f” e + 2^^. 

Similarly, we obtain 

Pr{y”e5„(x)}= ^ Pyn(y) 

y^Sr,{x) 



y^S^.{x)OTr,. 






Now, set 

= X - ^ log(l + \/^)- 

Since (2.1.28) means that 

1 log — < 1 log -^ + 1 log(l + v^) 
n Py»(y) n Py„(y) n 

for y G T„, we have §n{x^^) n T„ C Sn{x) fl T„. Hence, 
(2.1.35) can be evaluated in the following way: 

Pr{y”€5„(x)} 

iTT?: - 

yGS,,(n:r )nT„. 

' + yes, .(••“') ' + '^yeT. 



> 



> ^=Pr(y” e 5„(x^”^)} - — 2 — 



> 



1 + 

1 



Pr{y"eP„(x(”’)}-- 



1 + y/^ 
2v^ 



+ \/^ 



1 + y/^ 
which implies 

Pr{y” G 5„(x)} > Pr |y" G 5„(x'”^)} - Sy^. 

By noticing that 5n ^ 0, x and ^ x as n — > cx 

of (2.1.34) and (2.1.37) leads to 



lim L f 1 log p 1 log ■ ^ 



■SOO In ^ Pyr, ,(y«)’n Py„(y«) 



0 . 



(2.1.34) 



(2.1.35) 

(2.1.36) 

by using (2.1.29), 



Tn} 



(2.1.37) 
, the combination 



118 2 Random Number Generation 



Remark 2.1.6. So far the random variables X'^ and of general sources 
X = and Y = have been assumed to take values in the 

n-th Cartesian products and respectively. In fact, if and y'^ are 
replaced with arbitrary countably infinite sets Xn and respectively, and 
X'^ and with arbitrary random variables Xn and taking values in 
and Yn, respectively, all the arguments in this chapter are still valid (see §1.12 
in Chapter 1). In particular. Theorem 2.2.1 and Theorem 2.2.2, given be are 
based on such invariancy. □ 



So far we have given general concepts and general results on the random 
number generation problems. In the following sections we consider two special 
cases with operational importance. 



2.2 Resolvability and Intrinsic Randomness 

At the beginning of the preceding section we described the example show- 
ing how to generate a random number Y subject to an arbitrary continuous 
distribution from the random number U uniformly distributed over the in- 
terval [0, 1]. However, if the alphabet of U is discrete and finite, we cannot 
use such a uniform random number over a continuous interval as the coin 
random number. When the alphabet is discrete and finite, we should call Um 
a uniform random number if C/m satisfies 

= i e Wm = {1,2, •••M} 

for an arbitrary positive integer M. This Um is called a uniform random 
number of size M. We note here that, if we transform Um into Y — ^{Um) 
by using an arbitrary transform (/:, each probability Pv{y) becomes only 
some multiple of ^ (such a probability distribution is called the M-type 
distribution) . 

The uniform random number plays important roles in fields such as com- 
puter science. In this section we describe random number generation problems 
related to the uniform random number. Here, note that the information- 
spectrum of the uniform random number becomes a one-point spectrum, 
where the probability is concentrated at a single point. 

We give the first definition. It formulates the problem that, given a general 
source Y, we try to generate Y by using a uniform random number C/m„. of 
the smallest possible size 

Definition 2.2.1. LetY — be an arbitrary target random number. 

def 

Rate R is achievable 4=^ There exists a mapping (pn : 

satisfying lim sup — log < R and 

n-^oo 



2.2 Resolvability and Intrinsic Randomness 119 



lim d{Y^,(pn{UM„)) = 0, 

n—^oo 

where Um^ denotes the discrete uniform random number of size Mn defined 
above. 

Definition 2.2.2 (Resolvability: variational distance). 

Sr(Y) = inf {R\ R is achievable} . 



We have the following theorem under these definitions. 

Theorem 2.2.1 (Han and Verdii [46]). 

Sr{Y) = H{Y). (2.2.1) 

Remark 2.2.1. Combining (2.2.1) with Theorem 1.3.1 in §1.3 treating the 
fixed- length source coding, we obtain the formula 5^(Y) = Rf{Y). This 
formula gives one of fundamental relationships connecting random number 
generation problems with source coding problems. □ 



Proof of Theorem 2.2.1. 

1) Direct part: 

We show that R = is achievable for an arbitrary constant 7 > 0. 

First, if we set we clearly have lim sup — logM^ < R. Next, if we 

n— >00 

define X = by X'^ = Um,,, it is clear that HifX) = R, which im- 

plies HfX) > H{Y). Therefore, Theorem 2.1.1 guarantees the existence of a 
mapping satisfying lim d{Y'^ , (pn{UMr,)) = 0 - 

n -^00 

2 ) Converse part: 

If R is achievable, there exists a mapping (pn : satisfying 

limsup — logMn < R and lim d{Y^ = 0. 

n^oo n n->oo 

If we define X = by X'^ = we have R(X) < R. Hence, ap- 

plication of 1) in Theorem 2.1.2 leads to R > R(X) > R(Y), which implies 
R>H{Y). □ 

Next, we consider the intrinsic randomness problem that is a “dual” coun- 
terpart to this resolvability problem. In the intrinsic randomness problem, 
we try to generate a uniform random number Um^ of the largest possible size 
Mn by transforming a given general source X = 



120 2 Random Number Generation 



Definition 2.2.3. Let X = be an arbitrary coin random number. 

def 

Rate R is achievable There exists a mapping 

satisfying lim inf — log Mn > R and 

n—^oo n 

lim d(C/M„,^„(X”)) = 0. 

n— )-oo 

Definition 2.2.4 (Intrinsic randomness: variational distance). 

S't(X) = sup {R I is achievable} . 

We have the following theorem under these definitions. 

Theorem 2.2.2 (Vembu and Verdu [88]). 

^.(X)-F(X). (2.2.2) 

Proof. 

1) Direct part: 

We show that R = ILOQ — 7 is achievable for an arbitrary constant 
7 > 0. Setting Mn = it is trivial that liminf — logM^ > R. If we de- 

n— )-cxD n 

fine Y = by = Um , we clearly have H{Y) = R, which yields 

H{Y) < S(X). Therefore, Theorem 2.1.1 guarantees the existence of a map- 
ping : A'" ^ Um„ satisfying lim d{U m„ , V’n{X'^)) = 0. 

n— too 

2) Converse part: 

If R is achievable, then there exists a mapping pn • ^Mr, satisfying 

lim inf — log Mn > R and lim ^n(Y^)) = 0. 

n— too 77, n—^oo 

If we define Y - {Y^}n=i by Y^ = C/m^, we have H{Y) > R. Then, 
H_(X.) > H.{Y) > R follows from 2) in Theorem 2.1.2. Hence, H_(X.) > R is 
obtained. □ 



Remark 2.2.2. Theorem 2.2.1 and Theorem 2.2.2 claim that, given a ran- 
dom number Z, a uniform random number with rate H{Z) is required to 
generate Z, though we can only generate a uniformly random number with 
rate H_{Z) from Z. We can interpret the difference H{Z) — H_{Z) in the fol- 
lowing way. Generally, randomness contained in Z consists of the “clean” 
randomness H_(Z) that can always be transformed into a uniform random 
number if necessary and the “dirty” randomness H(Z) — H^{Z) that cannot 
be transformed into a uniform random number any more. From Remark 2.1.3 
we know that not only the sum H{Z) of these two kinds of randomness but 
also the clean randomness H{Z) generally decreases by transformations of 
the random number. However, the dirty randomness H{Z) — HfZ) does not 
always decrease; it sometimes increases. □ 



2.2 Resolvability and Intrinsic Randomness 121 



Now, we consider the intrinsic randomness problem with the normalized 
divergence distance — ) instead of the variational distance 

in Definition 2.2.3. To define this problem, we have only to 
use the following definitions instead of Definition 2.2.3 and Definition 2.2.4, 
respectively. 

Definition 2.2.5. Let X == he an arbitrary coin random number. 

Rate R is achievable There exists a mapping 

satisfying lim inf — log > R and 

n—^oo Tl 

lim 1 d(^„(X")||[/mJ= 0. 

n— >oo 77, 

Definition 2.2.6 (Intrinsic randomness: normalized divergence dis- 
tance). 

S'* (X) = sup {i^ I is achievable} . 



Then, we have the following theorem corresponding to Theorem 2.2.2. The 
obtained theorem is deeply related to Theorem 2.6.4 in §2.6 on the fixed- 
length source coding problem. 

Theorem 2.2.3 (Vembu and Verdu [88]). 

S:(X) =S(X). (2.2.3) 



Proof. 

1) Direct part: 

If — 0, the claim of the direct part trivially follows. Therefore, 

we can assume that > 0. For an arbitrary small constant 7 > 0 we 

define R = ^(X) — 27 > 0 and Mn = In Lemma 2.1.1 we consider 
UMr, = {I52, • • • ,Mn} and C/m„, instead of T^(a) and respectively, and 

set a = ^(X) — 27. In addition, we set Y'^ = for tho mapping 

Pn — > 3^^ = Um,, constructed in the proof of Lemma 2.1.1 (we define 
io — 1 as fo in Case 1)). Since we have Pr{T^ ^ T^(a)} = 0, it follows that 

Pyr,.{i) < Py^'fi) (1 < Vz < 7o < Mn), 

Pyr.{i^ + 1) < + 1) + + Pr {X^ ^ S,(a + 7)} . 

By taking Pyr,,(7o + 1) = consideration, we can evaluate the nor- 

malized divergence distance in the following way: 



-1 ID f '\ 

-D{Vn{X^)\\UMj = - Py„(*)l0g- 

n n 

%=\ 



PvAi) 



n Pyr.{lQ-\-l) 

< }py„ (io + 1) log r J_u 
n Pyr^(^o + l) 

r,—nR 



< R (e-”^ + + Pr {X” i 5„(a + 7 )}) , (2.2.4) 



122 2 Random Number Generation 



which corresponds to (2.1.14). However, since a + 7 = S(X) — 7, it holds 
that 



lim Pr{X^^5n(a + 7)} = 0. 

n^oo 

Hence, we obtain from (2.2.4) that 

lim -D{ipn{X^)\\UM„)=0. 
n^oo n 

That is, any rate R satisfying R < ^(X) is achievable. 

2) Converse part: 

Suppose that a rate R is achievable. That is, suppose that there exists 
satisfying 

lim inf — log Mn > R (2.2.5) 

n^oo Tl 

and 

lim =0. (2.2.6) 

n— >co 77, 

Setting 

= 1z?(^„(X”)||C/mJ, (2.2.7) 

(2.2.6) shows that lim — 0. Here, if we note that 

n—^oD 

PUm„ = {1, 2 , • • • , Mn}), 

(2.2.7) can be written as 

llogM„ = ^H{<pn{X^)) + (2.2.8) 

Then, we set 

= |x € A'" I 1 log < H{X) + 7} (2.2.9) 

for an arbitrarily small constant 7 > 0 and partition Um„, into the following 
two subsets: 

^mI = {w e Wm„ I (Pn^{m) n 5„ ^ 0} , 

^m!, = Um„ I iPn^{m) n = 0} . 

Now, we introduce a random variable Qn equals 1 if <^„(X") € and 2 if 
ipn{X'^) G ■ Then, it follows that 



2.2 Resolvability and Intrinsic Randomness 123 



- H{Qn) + H{^n{X^)\Qn) 

= H{Qn)^\nH{ifn{X^)\Qn = l) 

+ (1 - K)H[i^n{X^)\Qn = 2 ), ( 2 . 2 . 10 ) 

where An = Pr |(/?n(^^) ^ is clear from the definition of An that 

An>Pr{X^e5n}. ( 2 . 2 . 11 ) 

On the other hand, since \Sn\ < we have \\^n\\ < e’^^— 

under Qn — 1 ? where for a mapping / ||/|| denotes the size of the range of 
/. Thus, the second term on the right-hand side of (2.2.10) is evaluated as 

H{^n{Xn\Qn - 1) < n{H{X) + 7)- (2.2.12) 

Furthermore, by noticing that ||(/?n(-^^)|| < Mn for the case of Qn = 2, the 
third term on the right-hand side of ( 2 . 2 . 10 ) is upper-bounded as 

H{^n{X^)\Qn = 2) < logMn. (2.2.13) 

In addition, we trivially have 

H{Qn) <log 2 . (2.2.14) 

By substituting (2.2.12)-(2.2.14) into the right-hand side of (2.2.10), we ob- 
tain 

< 1 log2 + A„(ff(X) + 7 ) + 1^ logM„. 

n n n 

This, together with (2.2.8), yields 

^ logM„ < A„(ff(X) + 7 ) + I log2 + Sn. (2.2.15) 

On the other hand, from (2.2.11) and the definition of Sn in (2.2.9), there 
must exist a sequence of integers ni < n 2 < • • • ^ oo such that 

An, >3Ao>0 (Vi = l,2,..-). 

Then, (2.2.15) with n — rii implies 

- logM„, < H(X) + 7 + log2 + -^ 

<F(X) + 7 +-^log 2 +^. 

n^Ao Ao 

Hence, we obtain 

lim inf — log Mn . < S(X) T 7 , 

-i — >00 Tii 

where lim 5^ == 0 is used. Here, since 7 > 0 is arbitrary, letting 7 — > 0 yields 



124 2 Random Number Generation 




(2.2.16) 



Then, we obtain from (2.2.5) and (2.2.16) that 



R < lim inf — log < Uni inf — log ? 




which establishes R < ^(X). That is, any achievable rate R cannot be greater 



Remark 2.2.3. In the intrinsic randomness problem, *S'*(X) based on the 
normalized divergence distance coincides with ^^(X) based on the varia- 
tional distance. On the other hand, the situation becomes much differ- 
ent in the resolvability problem. Consider the case that the variational dis- 
tance d{Y'^ , ^pn{UMrJ) in Definition 2.2.1 is replaced with the normalized 

distance — D((/?^([/m„.)||^’^) and S'^(Y) in Definition 2.2.2 is represented as 

5*(Y). If Y is a stationary source, 5*(Y) < ^(Y) always holds. However, 
s;{Y) > ^(Y), the inequality in the opposite direction, does not always 
hold. The essential part of this problem can be understood well under a gen- 
eralized formulation called the 5-resolvability. We will give a formula for the 
generalized problem in Theorem 2.4.3 in §2.4 (see Remark 2.4.3). □ 



2.3 Strong Converse Theorem for Random Number 
Generation 

Theorem 2.2.1 on the resolvability (variational distance) given in the preced- 
ing section consists of the direct part and the converse part. What does the 
theorem become if we require the strong converse property in the converse 
part? This section is devoted to investigation of the strong converse property. 

Definition 2.3.1. In the resolvability problem the target random number 
Y = is called to satisfy the strong converse property if for any 

R satisfying R < Sr{Y), it holds that 



than H(X.). 



□ 



lim 



(2.3.1) 



for all Mn satisfying 
lim sup — log Mn < R 



and mappings (fn • ^ • 



□ 



2.3 Strong Converse Theorem for Random Number Generation 125 



Remark 2 . 3 . 1 . The variational distance always satisfies 0 < < 

2 . Therefore, ( 2 . 3 . 1 ) means that the probability distributions of and 
^ nip Mr,) completely separated asymptotically in the sense that the vari- 
ational distance between them is maximized. □ 



The following theorem holds on the strong converse property defined above. 

Theorem 2.3.1 (Strong converse theorem). The target random number 
Y satisfies the strong converse property if and only if it satisfies 

H{Y) = H(y)- 

Proof 

1) Sufficiency: 

Assume that H{Y) = KfY). Set R = P(Y)— 3 y (7 > 0 ) and define X = 
by — UMr, foi* an arbitrary Mn satisfying limsup — logM^ < R. 

n— >co 'Cl 

Since H (X) < P, we have 

__llmPr{ilogp7^>R + 7 }= 0 . 

Let be an arbitrary mapping. Then, Lemma 2.1.2 with 

a = P + 7 implies 

liminf d{Y'^ ,(fniU Mr)) > 21 iminfPr{Y^ ^ Tn(P + 27)}. 

n — >-oo n — >00 

W^otice here that i? + 27 = H{Y) — 7 = H.(Y) — 7 from the assumption 
ofH{Y)=H(Y). Therefore, 

lim Pr{y”^T„(/i + 27)} = l, 

n-^oo 

which leads to lim dCV'^,(fniUM )) = 2. 

2 ) Necessity: 

Set P = P(Y) - 7 (7 > 0) and define Trivially, we have 

lim sup — log Mn < R. 

n— >00 'Cl 

If we define X = by X'^ = UMr^i we have 

lim Ft <- log — — , < P - 7I = 0 

n— ^00 |n Px^'fXp J 

from P(X) = P. Therefore, setting a — R - 2^ and denoting the mapping 
defined in Lemma 2.1.1 by it holds that 

liminf d{Y'^ , TnpMr)) < 21 iminf Pr d Tn(R - 27)} . 

n — >-oo n — ^•oo 



126 2 Random Number Generation 



If the strong converse theorem holds, the left-hand side must be equal to 2. 
Hence, 

lim Pr{F”^T„(i?- 27 )} = l, 

n— >oo 

which implies H_(Y) > R — 2'y = H{Y) — 37 . By noting that 7 > 0 is arbi- 
trary, this means H.(Y) > H{Y). Since ^H(Y) < H{Y), the inequality in the 
opposite direction, always holds, H^(Y) = H{Y) follows. □ 

Next, let us define the strong converse property on the intrinsic random- 
ness (variational distance). 

Definition 2.3.2. In the intrinsic randomness problem the coin random 
number X = is called to satisfy the strong converse property if 

for any R satisfying R > it holds that 

lim d{UM.,,g^n{X^)) = 2 

n^oo 

for all Mn satisfying lim inf — log Mn > R and mappings cpn : ^ Um • 

n —^00 n 

Theorem 2.3.2 (Strong converse theorem). The coin random number 
X satisfies the strong converse property if and only if it satisfies 

H{X) = H{X). 

Proof 

1) Sufficiency: 

Assume that = HfX.). Set R = H_(X.) + 87 (7 > 0) and define Y = 

by = UMr, for an arbitrary Mn satisfying lim inf — logMn > R. 

n— >cxD ji 

Since H{Y) ^ R^ we have 

lim Pr I - log — — - 7 1 =0. 
n^oo \n ^ Py..{Y^) J 

Let ^n : — > Um„, be an arbitrary mapping and set a = — 27 . Then, 

Lemma 2.1.2 and Remark 2.1.2 yield 

liminfd(C/M.,^n(A:^)) > 21iminfPr{X^ ^ Sn{R~2-i)}. 

n — >00 n— J'oo 

We notice here that R — 2'y = ILOQ + 7 = H{X.) + 7 follows from the 
assumption of H_{X.) = Therefore, 

lim Pi{X^ ^Sn{R- 2^)} = 1, 

n —^00 

which implies lim d{UMr,’>Tn{X'^)) = 2 . 



2.3 Strong Converse Theorem for Random Number Generation 127 



2) Necessity: 

Set R = ^(X)-|-7 (7 > 0) and define Trivially, lim inf — log 

n— >00 n 

> R is satisfied. Defining Y = by = C/m,,, 

lim Pr(-log I >i^ + 7 |= 0. 

n— >00 n Py '^'[ Y ^) j 

follows from H{Y) = R. Accordingly, if we set a = + 7 and denote by 

(fn : ^Mr, l^he mapping defined in Lemma 2.1.1, it holds that 

liminfd(C/M„,^n(A:^)) < 21iminfPr{X^ ^ 6'n(i^ + 27)}. 

Note that the right-hand side must be equal to 2 if the strong converse prop- 
erty is satisfied. Therefore, 

lim Pr {X^ ^ Sn{R + 27 )} = 1, 

n —^00 

which implies + 27 > H(X.). Since R = H_(X.) -f 7, we have 
^(X) + 37>F(X). 

By noting that 7 > 0 is arbitrary, we obtain H_(X) > H{X.). Since the in- 
equality in the opposite direction always holds, S(X) = H{X.) is established. 

□ 



Remark 2.3.2. While the resolvability problem and the intrinsic random- 
ness problem for stationary ergodic sources satisfies the strong converse prop- 
erty, the strong converse property is not satisfied for mixed sources. This fact 
is parallel to the strong converse property on the fixed-length source coding 
considered in §1.5. □ 



The strong converse property on the random numbers defined in this 
section has the following operational meanings. 

1) For a given uniform random number U = {C/m„}^i of size == 
we generate another random number Y = by applying a transform 

ifn to U, where = ^n{UMr,,)’ After that, by applying a transform 'ipn 
to Y = {Y^}^]^, we try to generate a random number U = {'0n(^^)}^i 
that is arbitrarily close to the original random number U in the sense of 
lim d(C/M„ , '0n(^^)) == 0 (this operation is called the reproduction of a ran- 

n-^00 

dom number). Then, the following question arises: can we always reproduce 
a random number? If the reproduction is possible, it must hold from Theo- 
rem 2.2.1 and Theorem 2.2.2 that 

R > H{Y) > H{Y) > R. 



Therefore, we have 



(2.3.2) 



128 2 Random Number Generation 



H{Y) - H{Y) - R, (2.3.3) 

which means that Y must satisfy the strong converse property as the target 
random number of U (Theorem 2.3.1). 

Is it always possible to reproduce the uniform random number U if Y 
satisfies the strong converse property and (2.3.3)? In fact, though it is impos- 
sible to reproduce U itself, we can reproduce U in the following sense. That 
is, letting 7 > 0 be an arbitrary small constant and defining 
there exists a transform 'ij^n that generates a uniform random number in- 
stead of U Mr, satisfying lim d{UM' , '^n{Y^)) = 0 of size slightly smaller 

than Mn- This fact is guaranteed by Theorem 2.2.2 because H(Y) > R — y 
holds from (2.3.3). 

2 ) Next, for a given random number X == we generate a random 

number (pn{X'^) by applying a transform (pn to X that is arbitrarily close to 
a uniform random number U = of size in the sense 

of lim d{UMr, 5 ^n{X'^)) = 0. We then try to reproduce the original random 

number X by applying a transform 'ipn to JJ = {UMr, = V^n(-^^)}^i* By 
using an argument similar to 1 ) and Corollary 2 . 1 . 1 , we can see that the 
condition 

^(X) - H{X) = R (2.3.4) 

must hold for the existence of such a transform 'ipn- Accordingly, X must 
satisfy the strong converse property as the coin random number of U (The- 
orem 2.3.2). Now, suppose that X satisfies the strong converse property and 
the condition (2.3.4). Though we cannot reproduce X itself from U either, 
we can reproduce X in the following sense. Letting 7 > 0 be an arbitrary 
small constant and defining M!^ = gn(i?+ 7 ) ^ exists a transform that 
generates a uniform random number Um'^ of size M^, which is slightly greater 
than Mn^ satisfying lim d(X^, )) = instead of Um • This fact is 

guaranteed by Theorem 2.2.1 because i^(X) < i? -h 7 holds from (2.3.4). 

3) These are the operational meanings of the strong converse properties on 
the target random number and the coin random number. Summarizing, the 
strong converse property of a random number Z is the condition that the 
operations transforming a uniform random number into Z or Z into a uniform 
random number are approximately reversible. In this reversible process all the 
randomness is kept clean without decreasing its rate (see Remark 2 . 2 . 2 ). 



2.4 5-Random Number Generation 

In the resolvability problem given in § 2.2 we require that the variational 
distance between the probability distributions satisfies 

3n = d{Y'^, P>n{UMr,)) ^0 (n ^ 00 ) 



2.4 ^-Random Number Generation 129 



for a mapping = {1, 2, • • • , M^} — > In this section, however, we 

consider the resolvability problem under a weakened requirement. We require 
that satisfies 

lim sup Sn < S 

n— >oo 

for an arbitrary constant 0 < (^ < 2. We can expect to have a smaller achiev- 
able rate under this weakened requirement on the probability distribution 
approximation. We give the following definitions. 

Definition 2.4.1. 

def 

Rate R is (5-achievable There exists a mapping (pn : ^ 3^^ 

satisfying lim sup — log < R and 

n— »oo n/ 

limsupd(y^, 

n^oo 

Definition 2.4.2 (5-resolvability: variational distance). 

= inf {i^ I is 5-achievable} . 

Then, we obtain the following theorem under these definitions, which can be 
regarded as a generalization of Theorem 2.2.1. 

Theorem 2.4.1 (Steinberg and Verdu [85]). 

5^(5|Y) = inf 
where F{R) is the function defined by 

FiR) = limsup P. { i log > r} (2.4.2) 

in (1.6.1) in ^1.6 (here, notice that X'^ is replaced with Fig. 2.6). 

Remark 2.4.1. The right-hand side of (2.4.1) is a right-continuous and 
monotone decreasing function of 6. In addition. Theorem 1.6.1 combined 
with Theorem 2.4.1 yields a relationship 

Rf{e\X) = Sr{2e\X) (0 < Ve < 1), (2.4.3) 

which connects the 2£-resolvability with the e-source coding. □ 



F{R) < 2 



(0 <V(5 < 2), 



(2.4.1) 



130 2 Random Number Generation 




R 



H{Y) 



Fig. 2.6. 

Proof of Theorem 2,4- T 
1) Converse part: 

Suppose that R is 5- achievable. Then, there exists a mapping (pn • 
^Mr, satisfying limsup - logM^ < R and limsup^n < S, where 6n = 

n — ^OO ^ TL — >•00 

d{Y^,(pn{UMr,))‘ If we define X = by X'^ = we obtain 



because i^"(X) < R. Hence, by setting a = R-\-^ and applying Lemma 2.1.2, 
it follows that 

lim sup Sn >2 lim sup Pr ^ (i^ H- 2j ) } 

n—^oo n — >oo 

- 2F{R + 2-f). 

We notice here that F{R -i- 2y) < | follows since lim sup < S is satisfied. 



and assume that R < Rq. Since 7 > 0 can be arbitrarily small, we can choose 
7 > 0 satisfying -f 27 < Rq. Therefore, we obtain F{R + 27) > | from 
the definition of Rq^ which is a contradiction. Thus, R> Rq must be satisfied. 

2) Direct part: 

We prove that = i^o + 37 (7 > 0) is (5-achievable for Rq defined by 
(2.4.4). Setting limsup — logM^ < R clearly holds. If we define a 



because H_{X.) = R. Set a — R — 2^y and denote by p)n the mapping : 
hlMr, defined in Lemma 2.1.1. Defining 5n as 5n — d(y’^, it 

follows that 




Now, set 




(2.4.4) 



source X = by X'^ — we have 




2.4 ^-Random Number Generation 131 



lim sup Sn <2 lim sup Pr {y^ ^Tn{R — 2j ) } 

n— »oo n— >oo 

= 2F{R - 27) 

= 2F(i?o + 7). 

On the other hand, we have F{Rq + 7) < | from the definition of Rq. Thus, 
we can conclude that lim sup (5^ < S. □ 



Example 2.4.1. For a source Y satisfying the strong converse property 
(such as a stationary ergodic source), F{R) can be illustrated as Fig. 1.7 
in Example 1.6.1. Hence, the 5-resolvability 5'^(5|Y) becomes a constant in- 
dependent of 0 < 5 < 2 (however, the converse is not always true). On the 
other hand, for the mixed source considered in Example 1.4.1 in §1.4, F{R) 
can be illustrated as Fig. 2.7. We have 



Sr 




H[P2) 

H{Pi) 



for 0 < 5 < 2 q 2, 
for 2 q 2 < 5 < 2 



for such a mixed source. 



□ 



F{R) 

1 — 

a2 



H{Pl) R{P2) 



Fig. 2.7. 



Example 2.4.2. Let us consider the case that the target random number 
Y has a continuous spectrum given in Example 1.4.3 in Chapter 1. Since 
Lemma 1.4.4 (§1.4) claims that F{R) satisfies 

[ dw{9) < F{R) < [ dw{9), 

J{e\H(Xo)>R} J{9\H{Xe)>R} 

the 5-resolvability can be given by 

s] 



Sr{S\Y) =inf 



\p I 



{e\H(Xg)>R} 



dw{6) < 



132 



2 Random Number Generation 



F{R) 




Fig. 2.8. 



(Fig. 2.8). 



□ 



We can consider the ^-intrinsic randomness in a manner similar to the 
5-resolvability. We describe the 5-intrinsic randomness below. First, for a 
mapping 

(Pn : ^ UMr, = { 1 , • ' • ? ^n} 

we define the variational distance by 

While in the intrinsic randomness problem given in §2.2 we require that the 
variational distance 5n between the probability distributions satisfies 5^ — > 0 
as n oo, we only require here that 5n satisfies 

lim sup Sn < S 

n^oo 

for an arbitrarily fixed constant 0 < 5 < 2. We can expect to have a greater 
achievable rate under this weakened requirement on the probability distribu- 
tion approximation. We give the following definitions: 

Definition 2.4.3. 

def 

Rate R is 5- achievable There exists a mapping (p^ : 

satisfying lim inf — log > R and 

n— »oo 77, 

lim sup d{U Mr,, < S. 

n-^oo 

Definition 2.4.4 (5-intrinsic randomness: variational distance). 

= sup {i^ 1 is 5-achievable} . 



2.4 (5-Random Number Generation 133 




Fig. 2.9. 



Now, we define the function G{R) by 

G(R) = tosupPr{ilog^^^<fl}, 

which is a dual counterpart to F{R) in (2.4.2) (Fig. 2.9). Then, we have 
the following theorem, which is a dual counterpart to Theorem 2.4.1 and is 
regarded as a generalization of Theorem 2.2.2. 



Theorem 2.4.2. 



5,(^|X)-sup 





(0 < V(5 < 2). 



(2.4.5) 



Remark 2.4.2. The right-hand side of (2.4.5) is a right-continuous and 
monotone increasing function of 5. □ 



Proof of Theorem 2.4.2. 

1) Converse part: 

Suppose that R is (5-achievable. Then, there exists a mapping ipn : 
— > UMr, satisfying liminf — logM^ > R and limsup ^ where 5n — 

n ^cxD 77, ^ 

d{UM„,,^n{X'^))- If we define Y = {Y”}^i by Y" = Um„, we have 

lira Pr (1 log ) , < i? - 7 I = 0 

n— >00 \^n J 

for any 7 > 0 because ^(Y) > R. Then, setting a = R — and applying 
Lemma 2.1.2 (recalling Remark 2.1.2), it follows that 

limsup(5n > 21imsupPr{X^ ^ Sn{R - 

n— ^00 n^oo 

= 2G{R-2j). 

On the other hand, since limsup5„ < ^ we have G{R — 2'y) < |. Now, define 



R\ = sup 





(2.4.6) 



134 2 Random Number Generation 



If we assume here that R > Ri^ we can choose a 7 > 0 satisfying R — 2^>Ri 
because 7 > 0 can be arbitrarily small. Thus, we obtain G{R — 27) > | from 
the definition of which is a contradiction. Accordingly, R < Ri must be 
satisfied. 

2) Direct part: 

We prove that R = Ri — 3^ > 0) is (^-achievable for Ri defined by 

(2.4.6). Setting liminf — logM^ > R trivially holds. If we define 

n^oo Tl 

Y by = ^Mr,^ we obtain 

lim Pr 1 1 log p ] -r > iZ + 7 1 = 0 

n— ^00 n (y ) J 

because H{Y) = R. Now, set a = + 7 and denote by the mapping pn • 

^Mr,. defined in Lemma 2.1.1. Defining Sn as Sn = d(/7M„. , it 

follows that 

lim sup < 2 lim sup Pr { ^ Sn{R-\-‘2j)} 

n— >00 n —^00 

= 2G{R -b 27) 

= 2G(i?i-7). 

On the other hand, we have G{R\ — 7) < | from the definition of Ri. Thus, 
we can conclude that lim sup (5^ < S. □ 



Example 2.4.3. For a source X satisfying the strong converse property 
(such as a stationary ergodic source), G{R) can be illustrated as in Fig. 2.10. 
Hence, 5t((J|X) becomes a constant independent of 0 < ^ < 2 (however, the 



G{R) 
1 ■— 



0 



S(X) 



Fig. 2.10. 



converse is not always true). On the other hand, for the mixed source con- 
sidered in Example 1.4.1 in §1.4, G{R) can be illustrated as in Fig. 2.11. We 



2.4 5-Random Number Generation 135 



G{R) 



1 



0 



R 



H{Pl) H{P2) 



Fig. 2.11. 



have 




for 0 < 5 < 2 q; 
for 2ai < 5 <: 



2ai < 5 < 2 



for such a mixed source. 



□ 



Example 2.4.4. Let us consider the mixed source X with a continuous spec- 
trum given in Example 1.4.3 in §1.4 of Chapter 1. Since Lemma 1.4.4 (§1.4) 
guarantees that G{R) satisfies 



At the end of this section let us consider the 5-resolvability based on the 
normalized divergence distance, though we have already described the 5- 
resolvability based on the variational distance (Theorem 2.4.1). We give the 
following definitions for formulating this problem. Here, suppose that 5 is an 
arbitrary constant satisfying 5 > 0. 

Definition 2.4.5. 

def 

Rate R is 5-achievable There exists a mapping ipn : 




the 5-intrinsic randomness is given by 




(Fig. 2.12). 



□ 



satisfying lim sup — log < R and 



\imsnp - D{v^n{UMj\\Y^ < 5. 



136 2 Random Number Generation 



G{R) 




Fig. 2.12. 

Definition 2.4.6 (d-resolvability: normalized divergence distance). 



5*(5|Y) == inf {R I R is ^-achievable} . 



The 5- resolvability 5*((^|Y) defined in this way is a right-continuous and 
monotone decreasing function of 6. In fact, 5*(5|Y) is deeply related to the 
coding rate i^*(r|X), given in Chapter 1, for making the exponent of the 
probability of correct decoding less than r under the optimal fixed-length 
source coding (the infimum r-achievable fixed-length coding rate: part 2 (see 
Definition 1.10.2)). That is, we have the following interesting theorem that 
connects a random number generation problem with a source coding problem. 

Theorem 2.4.3 (Steinberg and Verdu [85]). 

S;{6\Y) = Rt{6\Y) {\/S > 0). (2.4.7) 

Remark 2.4.3. Theorem 2.4.3 combined with Theorem 1.10.1 in Chap- 
ter 1 enables us to compute values of the (5-resolvability 5*((5|Y). In ad- 
dition, if we define S*(Y) = S'*(0|Y) ((5 = 0), Theorem 2.4.3 implies that 
5*(Y) = R*(0|Y) ((5 = 0). This is nothing but the resolvability based on 
the normalized divergence distance described in Remark 2.2.3. It is easy to 
verify from Theorem 1.10.1 that *S'*(Y) < ^(Y) always holds provided that 
( 1 . 10 . 2 ) with instead of X'^ has a limit. That is, 5*(Y) becomes much 
smaller than the resolvability Sr{Y) — H(Y) based on the variational dis- 
tance. □ 



Proof of Theorem 2.4-3. 

1) Direct part: 

We show that any rate R satisfying R > R*((5|Y) is (5-achievable as a 
rate of the resolvability. First, we fix an arbitrary small constant 7 > 0 that 



2.4 5-Random Number Generation 137 



satisfies R> R — 2^ > i7*(5|Y). Then, from the definition of i7*(5|Y) there 
exists a fixed-length source code (n,Ln,£n) satisfying 

lim sup — log - — ^ — < 5, 

n—^oo ^ 1 

lim sup — log Ln < R — 27 , 

n—^oo 

which can be rewritten as 

1 - (Vn > no), (2.4.8) 

Ln < (Vn > no). (2.4.9) 

Here, {rn} denotes a sequence of positive numbers satisfying lim 0. 

n^oo 

Now, denote by ((/?*, the pair of the encoder and the decoder correspond- 
ing to the fixed-length code {n^Ln,Sn) and define 

T„ = {y € y"|y = C«(y))} . 

Clearly, we have 



\Tn\ < Ln 



(2.4.10) 



and we can express the error probability Sn as En — Pr {Y^ ^ Denote 
by Py^ the probability distribution over Tn defined by 



Py<y) 



-Py-(y) 

1 

0 



for y e Tn, 
otherwise 



and set Mn = By taking (2.4.9) and (2.4.10) into consideration and 
using an argument similar to the proof of Lemma 2 . 1 . 1 , we can show that 
there exists a transform (pn : ^Mr, Ln satisfying 



-Pyr„(yi) < Py^iYi) (1 < * < *0 - 1), 

Pyr^iyio) < Py^ iVio) + 

where Tn = {yi,y 2 , • • • ,yio} (*o < Ln) and F" = ipn{UM„)- Since, without 

loss of generality, we can assume that Pyn{y) > 0 for y G T^, Py^iYi) > — 

Ln 

must be satisfied for some 1 < i < io- Hence, we can consider that satisfies 

Pyrr (y^o)> Then, the divergence can be upper-bounded in the following 
Ln 

way: 



Di^niUM^W'^) 



io 

= ^Py„(yj)log 
i=l 
io 

= ^Py„(yi)log 
i=l 



Pyr^iYi) 

PyniYi) 

Py"(yi)(l ~ £n) 



138 2 Random Number Generation 



- E 7T3 



i=l 



ij^io 



Py'^iyi) 

Py-iYi) 

Py’^iYi) 

Pyr^iYio) 



(1 - £«) 



+ Py^ iVio) log + log 



1 



PT'iy^oJ 



< ■Pyn(yio)log 



+ log- 



(1 6 ^ 

1 



Py^iYip) 

Py^iYip) ' (1 -ffn) 



< ^yn(yio)log 
Now, we evaluate 
An = Py„(yio)log 



■Py"(yio) + e“ 



Py'' (Yip) 

Py^iYip) ^ 



+ n{5 + T„). 



-n7 



Py-iYip) 



(2.4.11) 



on the right-hand side of (2.4.11) according to the following two cases. For 
the case of Py^ {yio) > , An can be evaluated as 



/ln<Pi..(y.Jl0g(?|^)<l0g2. 



(2.4.12) 



For the case of Py^ (yio) ^ ^ can be evaluated as 

An < ^yr». (y^o) lc>g (y~/ ~ (y^o) ^C)g (2e "^Ln) 

< 2ne~^^{R - 27 ) -t- 2e'^^ log 2, (2.4.13) 

Therefore, — ^ ^ 0 as n ^ 00 is satisfied for both cases. By noticing that 
n 

Tn ^ 0 as n — > 00 , we obtain 

lim sup -D(7in(^M^) 11^^) < ^ 

n— >00 ^ 



from (2.4.11). That is, any rate R satisfying R > i^*(^|Y) is (^-achievable as 
a rate of the resolvability. 



2) Converse part: 

Fix an arbitrary rate R satisfying R < i^*((5|Y). We can choose a suffi- 
ciently small 7 > 0 that satisfies R < R^-2'y < i^*((5|Y). First, we order all 
the elements of in the decreasing order of their probabilities as follows: 

Py^' iYi) > -Py"(y 2 ) > Py-^Yz) (2.4.14) 

We write 3^" as (V” = {yi,y 2 , • • •}. Define L„ = and set 



2.4 (^-Random Number Generation 139 



= {yi,y2,---,yL„}, (2.4.15) 

£„ = Pr{r"^T„}. (2.4.16) 

We can choose a pair of an encoder and a decoder satisfying the 

condition == {y ^ 3^^Iy '0n(^n(y))l- Froi^i (2.4.15) and (2.4.16), this 

((/?*, 7 /;*) can be regarded as an (n, L^, ^n)-code. We notice here that, since 

1 logL„ = E + 27 < i?:(<5|Y) (Vn = 1, 2, • ■ •), 

the definition of Rl{6\Y) tells us that 

lim sup — log > S 

n -^00 ^ 1 

must be satisfied. Here, since 7 > 0 can be arbitrarily small, we can choose 
a 7 satisfying 

lim sup — log - — - — >5 + 27 

n—^00 1 <^n 

which implies the existence of a sequence of positive integers {nj} with ni < 
^2 <•••—> +00 that satisfies 

1 - (Vj > jo). (2.4.17) 

Now, suppose that an arbitrary sequence of positive integers {M^} satisfying 

lim sup — logMn < R (2.4.18) 

n—^00 TT' 

is given and define Y'^ = for an arbitrary transform 3^^- 

We rewrite (2.4.18) as 

M„ < (Vn > no). (2.4.19) 

If we set 

Bn = {y|-Pyn,(y) > 0}, 

we obtain 

|B„| < (Vn>no). (2.4.20) 

Then, since (2.4.15) and (2.4.20) lead to 

ir^nH^I > \Tn\-\Bn\ 

> g^(^+ 27 ) _ gn(i?+7) 

>ie»(«+ 27 ) (Vn>no), 

\T^ n Bn\ < \Bn\ < (Vn > no), 



we obtain 



140 2 Random Number Generation 



\Tn n > \T^ n Bn\ (Vn > no). (2.4.21) 

This means that there exists a one-to-one mapping gn : Ci Bn Tn 0 B^. 

We note here that 



Py^' (y) < Py- (5n(y)) (y e n 5„) ( 2 . 4 . 22 ) 

holds because all the elements of are ordered in the decreasing order of 
their probabilities as in (2.4.14). We now set 



Cn = {me Wm„ \Vn{m) G T„ n B„} , 
T>n = [me Um„ Iffinim) eT^n Bn} 



and define : Um„ — > Tn as the mapping satisfying ^„(wj = V’n(m) for 
m e Cn and ^„(m) = gn{Vn{m)) for m € P„. Define Y = ^„(C/m„). 
Then, by noticing that Py„(y) = Py’'(gn(y)) for y e T^n B„ and (2.4.22), 
be evaluated in the following way: 



dmUmJWY^ = E 

ygy,. y"fy; 

= E 

yeT„.nB„, ^ ^ 

yeT-nBr, 

= E 'V-(y)ios^ 

yeTr,.nB„. ’ 

I o / \M Py^idniy)) 

= E^w>o^t^ 

yer„ ^ 



> log 



= log 



1 



Pr {y« € Tn} 
1 



1 



where the last inequality follows from the log-sum inequality (see Lemma 1.2.3). 
This, together with (2.4.17), yields 



2.5 Variable-Length Intrinsic Randomness 141 



>log7^ 

J- ^rij 

>nj(<5 + 7 ) (Vj>jo). 

Consequently, we obtain 

limsup 1 d(v3„([/m„)||^”) > limsup — £>(y)„^(C/M„ 

n—^oo ji— )-oo 

> (5 + 7. 

However, this means that any rate R satisfying R < i^*((J|Y) is not 5- 
achievable as the rate of the resolvability because 7 > 0. □ 

Remark 2.4.4. We can also consider the ^-intrinsic randomness problem 
based on the normalized divergence distance as well as the 5-resolvability 
problem based on the normalized divergence distance. Let 5*(5|X) be the 
quantity corresponding to 5^,(5|X) in Definition 2.4.4 obtained by replacing 
the variational distance in Definition 2.4.3 with the nor- 

malized divergence distance —D{(pn{X'^)\\UMr,,)' We are also interested in 

characterizing »S'*(5|X) as a function of 6. However, up to now we do not 
have an answer to this problem. □ 



2.5 Variable-Length Intrinsic Randomness 

In this section we attempt to generalize the intrinsic randomness problem 
considered in §2.2. To this end, letting U = {0, 1, 2, • • • , jFC — 1} be an arbi- 
trary code alphabet and m an arbitrary nonnegative integer, denote by 
the uniform random number over (we call m the length of Here, 

f/(0) is defined as the random variable taking the null string A of length 0 
with probability 1. Then, the uniform random number of size con- 
sidered in §2.2 is equivalent to the uniform random number on 

where 

rUn = log^M^. 

We call this uniform random number a fixed-length uniform random number 
of length m^. Thus, the intrinsic randomness problem in §2.2 is equivalent 
to generating a fixed- length uniform random number of the largest 

possible length rUn by transforming a source X = This problem is 

called the fixed-length intrinsic randomness problem because all the lengths 
of random numbers that can be generated are equal to 771^. In order to for- 
mally define the fixed-length intrinsic randomness problem, we have only to 
replace Definition 2.2.3 in which appeared the intrinsic randomness problem 
(variational distance) in §2.2 with 



142 2 Random Number Generation 



Definition 2.5.1. 



Rate R is achievable There exist an and a mapping 

Tfl 

(Pri • ^ satisfying lim inf — - > R 

n— >oo n 

and lim d(U^'^^\(pn(X'^)) = 0. 

n— >oo 

Theorem 2.2.2 still holds under this definition, where we use K as the base 
of the logarithm of on the right-hand side of (2.2.2). 

We can consider the variable-length intrinsic randomness problem as 
well as the fixed-length intrinsic randomness. Hereafter, we investigate the 
variable- length intrinsic randomness problem. First, we call a variable- 
length uniform random number if I is an arbitrary random variable taking 
nonnegative integers. Let W be the set of all finite strings of U containing the 
null sequence A of length 0. For an arbitrary given variable-length mapping 
cpn : W we set 

= \ |^n(x)| = m} (2.5.1) 

for an arbitrary nonnegative integer m. Denote by J{(pn) the set of all non- 
negative integers m satisfying Pr ^ Rm} > 0. Furthermore, for each 
m G J{g^n) define the random variable over Vm subject to the probabil- 
ity distribution 

which is the conditional probability distribution of X^ given X^ G Vm^ While 
in the fixed-length intrinsic randomness problem we require that cpn{X'^) 
asymptotically coincides with a uniform random number, in the variable- 
length intrinsic randomness problem we weaken this requirement. We only 
require (/9^(X^) to asymptotically coincide with the uniform random 

number of length m, for each m G That is, we require that (pn{X'^) 

asymptotically coincides with the variable-length uniform random number. 
Then, the average length per source symbol of the variable-length uniform 
random number generated by g)ri is given by 

^E|¥>„(X”)| = i mPr{X"eP„}, (2.5.3) 

which is called the generating rate of the variable- length uniform random 
number. Given a coin random number X == we would like to 

transform X into a variable-length uniform random number with the largest 
possible rate (2.5.3). We first formulate this problem based on the variational 
distance d in the following manner: 



2.5 Variable-Length Intrinsic Randomness 143 



Definition 2.5.2. 

def 

Rate R is achievable <=> There exists a transform 

satisfying liminf — E|(p^(X^)| > R and 
n— foo n 

lim sup d{U^”^\(fn{X!^)) =0. 

Definition 2.5.3 (Variable-length intrinsic randomness: variational 
distance) . 

Sy{X.) = sup {i^ I is achievable} . 



We have the following theorem corresponding to Theorem 2.2.2 on the fixed- 
length intrinsic randomness. 

Theorem 2.5.1 (Han [40]). 

SJX) = liminf -Hk(X'^). (2.5.4) 

n—>‘00 Ti 

Here, we call the right-hand side of (2.5.4) inf-entropy rate of 'K = 
where K denotes the base of logarithms. 

Remark 2.5.1. Theorem 1.7.2 and Theorem 2.2.2 tell us that *S'i,(X) < 
Sy{X.) holds for any coin random number X. That is, we can generally 
make the rate of the variable-length uniform random number greater than 
the achievable rate of the fixed-length uniform random number. This fact 
is obvious because the variable-length uniform random number includes the 
fixed-length uniform random number as a special case. In addition, as is 
seen in the proof below. Theorem 2.5.1 does not generally hold if m in Defi- 
nition 2.5.2 is supposed to take positive integers. □ 



Proof of Theorem 2.5.1. 

1 ) Direct part: 

Let the coin random number X = be given. Let 7 > 0 be an 

arbitrary small constant and define 

~ [^j") {j ~ 2 , • • •) 

as the subintervals partitioning the interval [ 0 , + 00 ) each of which has the 
width 37 , where Rj = 3jj. We partition into the following disjoint subsets 
according to these subintervals {information- spectrum slicing): 

5 ^) ^ |x G I 1 log^ G I, I {3 = 0, 1, 2, • • •). 

Next, we partition J = { 0 , 1 , 2 ,---} into the two subsets as follows: 



144 2 Random Number Generation 



= |j > 1 1 Pr {x” e I , (2.5.5) 

J 2 = {0} U {j > 1 I Pr |X" 6 j (2.5.6) 



In addition, for each j e J\ we define the random variable X'j taking values 
in Sn^ subject to the probability distribution 

PxA^) 



P^r^(x) — 



Pr{X" € 



(xe5«), 



(2.5.7) 



which is equal to the conditional probability distribution of given X'^ G 
SnK On the other hand, since for x G Sn \ (2.5.5) and 

(2.5.7) guarantee that 

Px«(x) 



PxA^) 



Pr{X" e 54^’^} 






(2.5.8) 



for all X G Sn^ provided that j G J\. We now use Lemma 2.1.1 in the 
following manner. We replace X'^ and with 

and 



respectively, and use 



X^, J7(L«(i-27)fl,J)^ (1 _ 2 ^)Rj and ^Rj 

instead of X”, y", a and 7, respectively. Since Pr {X" ^ 5„(a + 7)} = 0 and 
Pr{yn ^ T„(a)} = 0 are satisfied, it turns out that there exists a mapping 
V’n ^ : Sn^ — > satisfying 

d(C/(L”(i-2'>')^.’J), (^^^^(Xj^)) < {j e Ji). (2.5.9) 

Now, we define the variable-length mapping <p„ : X” — > U* by 




Vn\x) 

A 



for X G Pn ^ (3j G Ji), 
otherwise. 



where A denotes the null string of length 0. Then, (2.5.9) can be written as 
d(C/(L”(i-2T')^iJ),<^„(X”)) < (j G Ji; Vn = 1,2, • • •)• (2.5.10) 



Notice here that we have 



(1 - > (1 - 2j)Rj > 0 

for j G Ji because Rj = 3jj. This implies that the lengths [n(l — 2j)Rj\ 
of the ranges of the mappings (fn \ Vj G Ji, are distinct for all 

sufficiently large n. In addition, if we set Cn = (p~^{A) and define a random 
variable Xq over Cn by 



2.5 Variable-Length Intrinsic Randomness 145 



(^) Pr {Xn eCn} ^ ’ 

we trivially have 

d{U^^\iPn{X^))=0 (Vn = l,2,---). (2.5.11) 

We now set 

J{<fn) = {0} U {[n(l - 2-f)Rj\ I j € Ji} 
and for each m G J{^n) ~ { 0 } define a random variable by 
{m=[n{l-2j)Rj\). 

By noticing Rj > 87 {j G Ji), (2.5.10) and (2.5.11) are summarized as 
sup 

mej(^ri) 

which implies 

lim sup =0. (2.5.12) 

Next, let us evaluate E\(pn{X'^)\. First, E\(pri{X'^)\ is evaluated in the 
following way: 

E|^„(X")| 

= mPi {(fn(X^) = m} 

meJi^Pri) 

= Y2 Nl-27)J?,JPr{x”G5«)} 

jeJl 

> ^ n(l - 2j)Rj Pr {x" G } - 1 

j&Jl 

= n(l-27)5]i?,+iPr{x"G5«)} 

jeJi 

-3n7(l-27)^Pr{x”G5«)}-l 

je Ji 

> n(l - 27 ) ^ Rj+i Pr {X" G - 8717(1 - 27 ) - 1 

jeJi 

= n(l-27);^ii,+iPr{x”G5W} 

jeJ 

- n(l - 27 ) Rj+i Pr {X" € } - 8717(1 - 27 ) - 1 

j€J2 

= 71(1 - 27 )^ /?,•+! Pr{x"G5« 

jeJ 



146 2 Random Number Generation 



-n(l-27) 5] 

j€J2-{0} 

- 3 n 7 (l - 27 ) Pr |x" e } - 3 n 7 (l - 27 ) - 1 

>n(l-27)^i?,+iPr{x”G5(^)} 

j€J 

- n(l - 27 ) X] {^" ^ ^ - 27 ) - 1 . 

jeJ2-{o} 

(2.5.13) 

We note here that, if j G J 2 -{ 0 }, Pr G 5n ^ | < follows from the 

definition. By using this, the second term on the right-hand side of (2.5.13) 
is evaluated as 

^ i?,+iPr{x”e5(^)} 

j€J2-{0} 

<37 (i + 

jeJ2-{o} 

oo 

<37^0' + 1 )K-^"^^^ 

J = 1 

3^Jj'-3n7" ^ 37K-3”T'" 

~ 1 _ — iir“3n7^)2 

Q-yK-^ny-^ 

- (1 _ K-^ny^y 



Thus, (2.5.13) can be written as 
E|^„(X”)| 

>n(l-27)X]^i+iPr{x”G5W)} 

J€J 



6 ri 7 (l - 
{I - 

Consequently, it follows that 



— 6717(1 — 27 ) — 1 . 



-E|¥^„(X”)| 

n 

>(l-27)^i?,+iPr{x”€5«)} 



67(1 - 27 )i^-^»^^ 1 

n 



2.5 Variable-Length Intrinsic Randomness 147 



67(1 - 



- 67(1 - 27) 



1 

n 



(1 - Jsr-3«72)2 

By taking liminf of both sides of (2.5.14), we obtain 



(2.5.14) 



liminf l-E|(/j„(X”) I 

n— >co Ti 

> (1 - 27) liminf -Fi^(X”) - 67(1 - 27) 

n— too Ti 

= limini - Hk ( X ^) - 27 biminf ii?K(X”)^ - 67(1 - 27). (2.5.15) 

n— >00 n \ n^oo n J 

Since 7 > 0 can be arbitrarily small, the combination of (2.5.12) and (2.5.15) 
means that any rate R satisfying 

ii< liminf 1 fk(^") 

n— too Ti 

is achievable. 



2) Converse part: 

Suppose that R is an achievable rate. That is, suppose that a mapping 
ifri : ^ W satisfying 

liminf 1 e|(^„(X")| > ii (2.5.16) 

n— too Ti 

and 

lim sup d(t/(’"),^„(X”))=0 (2.5.17) 

is given. Set 



and denote by the random variable subject to the conditional probability 
distribution of X'^ given G Vm expressed as 






Px’-jx.) 

Pi{X^eVm} 



(x e T > m )- 



We now define the random variable In satisfying In = m if X'^ G Vm- Then, 
the probability distribution of In is given by 



P/Jm) - Pr{X- G Vn,} (m G J{ipn)). 

We can obtain a lower bound of the entropy Hk{X'^) as follows: 
Hk{X^) > HK{X-\In) 

= HKmPr{X^€Vm} 

meJi^ri) 

HKiVnm)Pr{X^€Vm}. 

meJi^Pn.) 



> 



(2.5.18) 



148 2 Random Number Generation 



Here, setting 

^(m) ^ (m G 

applying the inequality (cf. Csiszar and Korner [19]) 

Rm 

\HkMX::,)) - Hk{U^^^)\ < log^ ^ (2.5.19) 

and considering = m, (2.5.18) can be evaluated as 

Hk{X^) 

= Y mPr{X"eP„} 

+ Y {St^iogj,Slr^-m5lr^)Fv{X^eV„,} 

mej(^n) 

= E\<Pn{Xn\ 

+ Y {Slr^logi,dlr^-m5t^)Px{X^eV^}. (2.5.20) 

mej{iPn) 

Since (2.5.17) means that there exists a sequence {5n > 0} satisfying ^ 0 
as n — > 00 and 

Sir^<Sn (Vm€j(^„)), 

(2.5.20) can be written as 

HKiXn 

> E1(^„(X")| + Y - m6n) Pr {X" € X>m} 

= (l-J„)E|^„(X")| + 5„log^5„, 
which leads to 

-Hk{X^) > 1^E|^„(X")| + ^ log^ 5„. (2.5.21) 

n n n 

Taking liminf of both sides of (2.5.21) and noticing <5^ ^ 0 as n — > oo, we 

n—^oo 

obtain 

liminf li?K-(^”) > liminf 1e|(^„(X”)| > R, 

n^oo n n^oo n 

where the last inequality follows from (2.5.16). That is, any achievable rate 
R cannot be greater than liminf □ 



2.5 Variable-Length Intrinsic Randomness 149 



Remark 2.5.2. Notice that Theorem 2.5.1 given above holds even if A' is a 
countably infinite source alphabet. Theorem 2.5.1 was proved by Vembu and 
Verdu [ 88 ] for the case that A' is a finite set. □ 



Remark 2.5.3. As is easily seen from the proof of Theorem 2.5.1, Theo- 
rem 2 . 5.1 still holds even if we replace the “supremum of the variational 
distance” 

sup d{U^'^\<Pn{X^)) 

meJi^ri) 

in Definition 2.5.2 with a stronger the “sum of the variational distance” 

Remark 2.5.4. The direct part in the proof of Theorem 2.5.1 actually 
claims that the variational distance goes to 0 in the exponential order of 
block length n. That is, for an arbitrary rate R satisfying 0 < R < Sy{X.) 
there exists a mapping W satisfying 

liminf — E|(/?n(AT^)| > R 

n— »oo 77, 

and 

^ (Vn>no). 

Here, E{R) is a positive and monotone decreasing function in the interval 
0<R<5,;(X). □ 

Remark 2.5.5. If A is a finite source alphabet, 

J{ipn) = {m I Pr{|(^„(x”)| = m} > 0 } 

also becomes a finite set for a transform (pn : — > W generating the 

variable- length uniform random number. In this case it holds that 

max m < n{logj^ | A] + 7 ) (Vtt, > no) (2.5.22) 

for an arbitrarily small 7 > 0 , which can be proved in the following way. 
Suppose that (2.5.22) does not hold. Then, there exist sequences of integers 
{njgi and satisfying rui G Ji^Pm) and rui > ? 2 i(log^ |A| -f- 7 ), 

where — > 00 as i > 00 . Setting 

Cn. = {V’ni(x) I , 

\Grii\ < follows since (x G A^^) can take at most values. 

Hence, we obtain 



150 2 Random Number Generation 



uGW"*"^ 

^ E 

\xeU^i-Gr,. 

— P (^) 

w^lA^i — Gji^ 

= {K^^ -\GnM~^^ 

> - \X\^')K~"^' 

= 1 - \X\^'K-^' 

> 1 — l‘^|-«i(logK \X\+'i) 

= 1 _ 

Accordingly, it follows that 

liminfd(C/(™‘\^„,(X;j,^J) > 1, 

i^oo 

which contradicts Definition 2.5.2 on the variable-length uniform random 
number generating rate R. This establishes (2.5.22). □ 

Remark 2.5.6. Elias [25] gave the definition of the variable- length uniform 
random number generator using 

)) =0 (Vm e (2.5.23) 

instead of 

lim sup d{U^'^\(pn{Xl^)) = 0 
me 

in Definition 2.5.2. Under the condition (2.5.23) he showed that Sy{X) = 
lim when X = is a stationary memoryless process with 

n— >oo Ti 

binary alphabet = {0, 1} or a two-state Markov process. □ 






We can also consider the variable-length intrinsic randomness problem 
using the divergence distance D{(pn{X^)\\U^'^^) instead of the variational 
distance d{U^'^\ (pn{Xl^)) in Definition 2.5.2. In order to formulate this prob- 
lem we have only to replace Definition 2.5.2 and Definition 2.5.3 with, respec- 
tively: 

Definition 2.5.4. 

def 

Rate R is achievable <4=^ There exists a transform (pn : W 

satisfying liminf — E|(y9n(-^^)| > P and 

n— >oo Ti 

lim sup Z?(^„(A”)||L/<-))=0. 



2.5 Variable-Length Intrinsic Randomness 151 

Definition 2.5.5 (Variable- length intrinsic randomness: divergence 
distance). 

(X) = sup {i^ I is achievable} . 



Then, we have the following theorem that corresponds to Theorem 2.2.3 
treating the fixed- length intrinsic randomness. 

Theorem 2.5.2 (Han [40]). 

5+(X) = liminf (2.5.24) 

n-^oo Ti 

where K on the right-hand side means the base of logarithms. 

Proof. 

1) Direct part: 

We can define (fn • ^ in the same way as the direct part in the 

proof of Theorem 2.5.1. We use the following evaluation on the divergence 
distance instead of the evaluation of the variational distance (2.5.9): 

< [n(l - 2'y)Rj\ ^ 

(See the proof of the direct part of Theorem 2.2.3. Notice that, however, we 
have Pr{X^ ^5n(a-f7)}=^0 here.) As a consequence, we obtain 

lim sup £»((/?(f)(X”)||t/<Ln(i- 27 )R,J)) = 0. 

n->oo 

2) Converse part: 

The proof of the converse part parallels the proof of the converse part of 
Theorem 2.5.1. Here, we use the equality 

instead of the inequality (2.5.19). □ 



Remark 2.5.7. As can be easily seen from the proof of Theorem 2.5.2, The- 
orem 2.5.2 still holds even if we replace the “supremum of the divergence 
distance” 

sup 

meJi^Pn.) 

in Definition 2.5.4 with a stronger the “sum of the divergence distance” 

meJiPri) 



152 2 Random Number Generation 



Remark 2.5.8. Since the inequality (cf. Pinsker [76], Csiszar and Korner 

[ 19 ]) 

i ),[/(-)))' < 

always holds with respect to the variational distance and the divergence dis- 
tance, a rate R is also achievable in the sense of Definition 2.5.2 if R is 
achievable in the sense of Definition 2.5.4. Hence, the claim of Theorem 2.5.2 
is stronger than the claim of Theorem 2.5.1 in the achievability part. □ 



Remark 2.5.9. Similarly to Theorem 2.5.2, Theorem 2.5.1 holds even if X 
is a countably infinite alphabet. Notice here that we do not use the divergence 
distance in the form of 

(2.5.25) 

but the form of 

D(^„(X;^)||f/('")) (2.5.26) 

in Definition 2.5.4 defining 6'+(X), the variable-length intrinsic randomness 
based on the divergence distance. Actually, we used — in- 
stead of Mrt) in Definition 2.2.5. This means that Theorem 2.5.2 

is stronger than Theorem 2.2.3 in the achievability part. On the other hand, 
Vembu and Verdu [88] showed that a theorem corresponding to Theorem 2.5.2 
holds under the definition using (2.5.25) instead of (2.5.26) in Definition 2.5.4 
for the case that A is a finite alphabet. However, as is seen from the proof 
of Theorem 2.5.2, the theorem in this form holds even if A' is a countably 
infinite alphabet. □ 



Next, let us consider another intrinsic randomness problem not using 
sup i?(^„(X”)||[/(™)) 

in Definition 2.5.4 but using the normalized conditional divergence distance 

(2.5.27) 

which leads to a weaker statement. Here, the conditional divergence distance 

(2.5.27) is defined by 

i?(^„(X”)||[/(^")|/„)= ^ Pr{/„ = m}i?(^„(X;;)l|f/(’")), 

meJ{(Pri) 



(2.5.28) 



2.5 Variable-Length Intrinsic Randomness 153 



where In is the random variable satisfying In — m \i X'^ € Vm- 

To formulate this problem we have only to replace Definition 2.5.4 and 
Definition 2.5.5 with, respectively: 

Definition 2.5.6. 

def 

Rate R is achievable There exists a transform (pn ' 

satisfying liminf — E|(/?^(X’^)| > R and 

n— KX) Tl 

lim = 

n-^oo n 

Definition 2.5.7. (Variable- length intrinsic randomness : normal- 
ized conditional divergence distance) 

5*(X) = sup {i? I -R is achievable} . 



Then, we have the following theorem corresponding to Theorem 2.5.2. 
This theorem is deeply related to Theorem 2.6.5 in §2.6 treating the variable- 
length source coding. 

Theorem 2.5.3 (Han [40]). 

5:(X) = liminl (2.5.29) 

n—*oo n 

where K on the right-hand side of (2.5.29) means the base of logarithms. 
Proof. 

1) Direct part: 

We define (pn ' in the same way as the proof of the direct part 

of Theorem 2.5.2. 

2) Converse part: 

We can prove this in the same way as the proof of the converse part of 
Theorem 2.5.1. Here, we use the equality 

HkMXD) = m - ^ li?(<^„(X")||C/(-))) 

instead of the inequality (2.5.19). □ 



Remark 2.5.10. Theorem 2.5.3 still holds if the range J{<Pn) of 
is restricted to 

J{}fn) C {m I m > rUn} , 

where {nin} is an arbitrary sequence of nonnegative integers satisfying 



154 2 Random Number Generation 




This fact is in contrast with Theorem 2.5.1 (and Theorem 2.5.2) that only 
holds, as is mentioned in Remark 2.5.1, under the condition of {0} C 



2.6 Random Number Generation and Source Coding 

The random number generation (the probability distribution approximation) 
problems treated in this chapter are deeply related to the source coding prob- 
lem described in Chapter 1. This section is devoted to description of the 
relationship between the random number generation and the source coding. 

Consider an arbitrary (n, M^, e^)-code (fixed-length code) for a given 
general source X == Denote the encoder and the decoder corre- 
sponding to the (n,Mn,^n)-code by Mn = {1,2, and 

V^n • M.n respectively. Let X'^ — denote the reproduced 

information obtained from the encoding of information X'^ from the source 
X followed by the decoding of (pn{X'^). Then, the error probability of this 
code is given by 



□ 



(2.6.1) 

from its definition. The following general lemma holds with respect to the 
error probability (2.6.1) and the variational distance d{X'^,X'^) between 




and X^. 



Lemma 2.6.1. For all n = 1,2, - - 



d{X^,X^)<2en. 



( 2 . 6 . 2 ) 



Proof. Since the error probability Sn is written as 






xGA:’^ x':x't^x 



the variational distance is evaluated in the following way: 



= |Pv»(x) - 



xGA*^' 







2.6 Random Number Generation and Source Coding 155 



x€^^'' x':x't^x x':x't^x 

< ^ ^ P;^„x„(x,x')+ ^ Y1 ^X"X"(x',x) 

xGAf^'- x':x't^x xGA'’^'- xGx't^x 

Remark 2.6.1. Let Z and V be random variables taking values in the same 
countably infinite alphabet Z and denote by V the set of all joint probability 
distributions Pzv over Z x Z satisfying 

Y,Pzv{z,v) = Pz{z) {\Jz&Z), 

V^Z 

Y,Pzv{z,v) = Pv{v) {^vez). 

z^Z 

It is known that 

d{Z,V) = 2 inf Vx{Z^V) (2.6.3) 

Pzv^V 

(cf. Gray and Ornstein [34], Strassen [86]). Therefore, (2.6.2) can viewed as 
a consequence of (2.6.3). □ 



Lemma 2.6.1 immediately yields the following theorem. 

Theorem 2.6.1. Consider a general source X = ^or an arbi- 

trary fixed-length code (n, Mn,Sn) '^^ih a pair of an encoder and a decoder 
if the error probability €n satisfies lim Sn = 0, then the variational 

distance between and X'^ = '0^((/9n(X^)) satisfies lim d{X'^,X'^) = 0. 

n— »cxD 

That is, if lim Sn = 0, then the probability distribution of the reproduced in- 

n—^oo 

formation X'^ becomes arbitrarily close to the probability distribution of the 
source output X^ in the sense of the vanishing variational distance as the 
block length n tends to infinity. □ 



Theorem 2.6.1 tells us that a source coding problem can be viewed as a 
probability distribution approximation problem. Let us consider the following 
random number generation problem in order to acquire a deep insight into 
this point. Suppose that we use a source X = {X'^}’^^i as the coin random 
number as well as the target random number. We consider a transform (f)^ : 
satisfying 

lim d{X^,(l)n{X^)) =0 (2.6.4) 

n— »oo 

and try to find the minimal size of (j)n{X'^) under this constraint. This for- 
mulates the problem called the self-random generation problem that, given a 



156 2 Random Number Generation 



random number, we generate the same random number from the given ran- 
dom number at minimal expense. In addition, we call the random number 
satisfying (2.6.4) the self-random number of X == In or- 

der to formulate this problem we give the following definitions. Here, for a 
mapping /, ||/|| denotes the size of the range of /. 

Definition 2.6.1. 

Rate R is achievable There exists a mapping 

satisfying lim sup — log | |</>n 1 1 < R and 

n— j-oo n 

lim = 0. 

n— ^•oo 

Definition 2.6.2 (Infimum achievable self-random number generat- 
ing rate). 

5o(X) = inf I is achievable} . 



We have the following theorem under these definitions. 

Theorem 2.6.2. 

So{X)=H{X). 

Proof. 

1) Direct part 

For any rate R satisfying R > H{X), Theorem 1.3.1 in Chapter 1 guar- 
antees the existence of an (n, Sn)-code (fixed-length code) satisfying 

lim £:n = 0 and lim sup — logM^ < R. (2.6.5) 

n >oo — ).QQ 77- 

Denote by the pair of the encoder and the decoder corresponding 

to the Mn,€n)-code. Defining a mapping 0^ : by = 

'ipniPni'x.)), \\4>n\\ < follows from \\(pn\\ < ^n- Thus, we obtain from 
(2.6.5) that 

lim sup — log 1 1 0^1 1 < R. (2.6.6) 

n—^oo ^ 

On the other hand, since lim €n = 0 from (2.6.5) again. Theorem 2.6.1 im- 

n— >oo 

plies 

lim d{X^,4>n{X^))=0. (2.6.7) 

n-^oo 

Equations (2.6.6) and (2.6.7) mean that R is achievable as a self-random 
number generating rate. 

2) Converse part: 

We need the following simple lemma for proving the converse part. 



2.6 Random Number Generation and Source Coding 157 



Lemma 2.6.2. If a source X == satisfies 

I {x G AT” I Px"(x) > 0 } I < M„ 
for a positive number M^, we have 

{ S "» + t} s 

for an arbitrary constant 7 > 0. 



( 2 . 6 . 8 ) 

( 2 . 6 . 9 ) 



Proof. Set 






r„ = -j X € A'" I Px"(x) > 0 and 1 log ^ > 1 logM„ + 7 J- . 

' n Px" (x) n ' 



Then, we have (x) < 



1 






for X G Tn- We notice here that |T^| <M„ 



from the assumption (2.6.8). Therefore, it follows that 



= XI -Px™(x) 

< y -J— 



> - log M„ + 7 



xGT„ 

< e“"^, 

which complete the proof of the lemma. 



□ 



Now, we prove the converse part. Suppose that R is achievable as a self- 
random number generating rate. Then, there exists a mapping (l)n : 
satisfying 



lim d{X'^ , (j)n{X'^)) = 0 and lim sup — log [ 10^1 1 < R- 



( 2 . 6 . 10 ) 



Setting X'^ = (j)n{X'^) and letting X = we have 

1{XG A-|P^4x)>0}|<||(/>n||. (2.6.11) 

Since from ( 2 . 6 . 10 ) \\(t>n\\ < (Vn > no) for an arbitrarily small 7 > 0, 

by substituting this into (2.6.11) we obtain 

I {x G 1 Py,, (x) > 0 } I < (Vn > no). 

Hence, Lemma 2.6.2 yields 



T. 1 1 1 

Pr < — log ^ — 

In P^.(X-) 



> P -f- 27 > < e 



— n'f 



(Vn > no). 



158 2 Random Number Generation 



Thus, we obtain 

lim Pr < — log \ =0, 

n-oo J 

which shows that + 27 > H{X). By recalling that, since 7 > 0 is ar- 
bitrary, we have R > H(X). On the other hand. Corollary 2.1.1 implies 
H{X.) = H(K) because lim d{X'^,X'^) — 0 from (2.6.10). Consequently, 

n-^00 

R > H{X) must be satisfied. That is, any achievable rate R cannot be less 

than:^(X). □ 

While 1) in the proof of Theorem 2.6.2 claims that all decoders of the 
fixed-length code satisfying lim Sn = ^ generate the self-random number of 

n —^00 

a source X, 2) in the proof of Theorem 2.6.2 claims that there is no self- 
random number generator with a rate less than the minimum rate realized 
by the fixed- length codes for the source X. This theorem is one of the points 
of contact where a source coding problem and a random number generation 
problem meet. 

Let us consider another point of contact where a source coding problem 
and a random number generation problem meet. In the argument above we 
are interested in the probability distribution of the reproduced information 
X'^ = for a given pair of an encoder and a decoder 

for a source X = Hereafter, we are interested in the probability 

distribution of the output X'^ = (pn{X'^) of the encoder cpn- The following 
fact immediately follows. Suppose that we only consider the case that the 
error probability satisfies lim Sn — 0. Set X = and X = 

n— >co 

While Corollary 2.1.2 tells us that 
H(X) > H{±) > H{X), 

H{X) > H{X) > H(X), 

Corollary 2.1.1 and Theorem 2.6.1 show that 

H{X) = H{X), 

H{-X) - H{X). 

Hence, we obtain 

H{X) = H{±) = H{X), 

H{X) = H{X) = H{X). 

This means that the information-spectrum of X'^ = (pn{X'^) is distributed be- 
tween J[L{X) and H(X) if n is sufficiently large. The following theorem shows 
how the information-spectrum of X'^ is distributed. The theorem claims that 
the information-spectrum is asymptotically kept unchanged during the pro- 
cess of the encoding and the decoding satisfying lim Sn = 0. 



( 2 . 6 . 12 ) 

(2.6.13) 



2.6 Random Number Generation and Source Coding 159 



Theorem 2.6.3 (Han [40]). Consider an arbitrary {n, Mn,£n)-code for a 
source X = Denote by {(pn.'f’n) the pair of the encoder and 

the decoder corresponding to the code. Setting X'^ = 

'0n(v^n(^^)); tt holds that 



lim L 

n— ^■oo 










= 0 , 



(2.6.14) 



lim L 

n—^oo 



n PxAX^) 




Px.m, 



= 0 



(2.6.15) 



provided that lim £n = 0. That is, the information- spectra of the output from 

n-^oo 

the source X'^, the output from the encoder X^ and the output from the 
decoder X^ asymptotically coincide in the process of the fixed-length source 
coding satisfying lim = 0. Here, L(-, •) denotes the Levy distance defined 

n— »oo 

in Definition 2.1.2. 



Proof. Since (2.6.15) is obvious from Theorem 2.1.3 and Theorem 2.6.1, we 
have only to prove (2.6.14). First, for simplicity we set 

F,(x) = Pr{ilogj^:l^<x}. 

F„M = P,|i|og^;^<x|, 

FXx) = Pr < — log ^ — < X > 

for an arbitrary real number x. Set x' = (^n(x). Since we have (x') > 
Px^'^ (x) and therefore 

1 1 1 / 1 1 1 
n °®P;^„(x') - n °®Pxn(x)’ 

we obtain 

F{x) > Fn{x). (2.6.16) 

Similarly, by noticing that X^ — ?/;^(X^), we obtain 

F{x) > Fn{x) (2.6.17) 

as well. On the other hand, in view of the definition of the Levy distance, 
(2.6.15) means that there exists a sequence {/in} {l^n > 0) with lim /in = 0 

n— >cxD 

that satisfies 



Fn{x /in) Tn — Pn(^) ^ F^iix -f- /in) ~t" /in- 



(2.6.18) 



160 2 Random Number Generation 



Then, it follows from (2.6.16)-(2.6.18) that 

-^n(^) ^ ^ ri(^) — ^ n{^ "b /^n) “b 

which establishes (2.6.14). □ 

Let us consider here a source X == satisfying the strong converse 

property in particular. We know from Theorem 2.6.3 that the information- 
spectrum of the output of an encoder == (pn{X'^) for the source X 
asymptotically becomes the one-point spectrum. However, we actually have 
a stronger claim. That is, we can prove that X'^ is approximately subject to 
a uniform distribution. The following theorem claims that the optimal fixed- 
length encoder (pn satisfying lim 8^ = 0 and achieving the infimum achiev- 

n— ^•oo 

able fixed- length coding rate Rf(X.) (see Definition 1.1.2) can be viewed as 
a random number generator that generates the fixed-length uniform random 
number of rate Rf(X.) by transforming the source X (intrinsic randomness 
problem). Here, we use the normalized divergence distance as a measure of 
the probability distribution approximation (see Definition 2.2.5). 

Theorem 2.6.4. Let X = {X'^}^^i he an arbitrary source satisfying the 
strong converse property. Consider an arbitrary fixed-length {n^ MmSn)-code 
satisfying 

lim Sn = 0, (2.6.19) 

n— >cxD 

lim — logMn = -Rf(X) (2.6.20) 

n^oo n 

and denote by pri the encoder corresponding to the code. Then, it holds that 
lim -D{ipn{Xn\\UMj = 0, (2.6.21) 

n-^oo n 

where UMrt denotes the uniform random number on the range 
Xifi {1, 2, • • • , ibfn} 
of the encoder ipn • 

Remark 2.6.2. If the source X = {X'^}^=i does not satisfy the strong 
converse property, (2.6.21) does not hold for any encoder pri satisfying 
(2.6.20). To establish this fact, suppose that (2.6.21) holds for some encoder 

Pn- Theorem 2.2.3 tells us that lim — logM^ <^(X). Then, we obtain 

n-^oo n 

Rf(X.) < H,(X.) from (2.6.20). Since Theorem 1.3.1 tells us that we have 
iJ(X) < S(X), we have i^(X) = which means that X satisfies the 

strong converse property. Hence, this, together with Theorem 2.6.4, implies 
that the strong converse property of the source X is a necessary and sufficient 
condition of (2.6.21) under the conditions (2.6.19) and (2.6.20). □ 



2.6 Random Number Generation and Source Coding 161 



Proof of Theorem 2.6.4- 

Since X satisfies the strong converse property, we have 
Rf{X) = H{X) = H{X) = R 

from Theorem 1.3.1 and Theorem 1.5.1. Thus, if we set X'^ = (p^(X^) and 
X = (2.6.12) and (2.6.13) imply 

Rf(X) = ff(X) = H{X) = R, (2.6.22) 

which means that X satisfies the strong converse property. For 7 > 0 define 

r„={m€M„|ilogj^>R-7}, 

, 5 „ = Pr{x”^T„}. 

Then, we have 

lim (5„ = 0 (2.6.23) 

n— >00 

from the definition of H,{X) = R. Furthermore, (2.6.20) and (2.6.22) imply 
Mn < (Vn > no). (2.6.24) 

Now, noticing that PuMr, (^) ~ evaluate the divergence between X^ 

and U Mr,, ill following way: 

D{X-\\UmJ= Y. ^xr.Xm)logp^ 

= Y Px^>im)log{Pji4m)Mn) 

= Y ("i) log -Px" (^) + log Mn 

mEM-n 

= Y ■fx”("^)log-Pxn(m) 

m^Tri, 

+ Y Pjir^ i'fn) log (m) + log Mn 

m^Tn 

< Y (”^) log -Px" (”^) + log Mn, ( 2 . 6 . 25 ) 

m^Tri 

where the last inequality follows from Px^ (m) < 1. If we notice here that 
Pxr,.{m) < for m e T^, by substituting this and (2.6.24) into 

(2.6.25), it follows that 

D{X^\\UmJ < -n{R - 7) Pr {x” G T„} + n{R + 7) 

= —n(R — 7)(1 ~ ^n) + n{R + 7) 

— 2n7 + n{R - 7)^n- 



162 2 Random Number Generation 



Thus, we obtain 

limsup < 27 

n— >cxD TL 

from (2.6.23). Since 7 > 0 is arbitrary, we establish 
lim -D{X^\\Um„) =0 

n—>oo TL 

by letting 7 — > 0. □ 



Remark 2.6.3. Theorem 2.6.4 does not always hold if we replace the nor- 
malized divergence distance in (2.6.21) with the variational distance 

lim d((/?n(^^)5 ^Mri) — 0- (2.6.26) 

n^oo 

For example, define = {1, 2, • • • , M^} {M^ = for a constant R > 0 
and a source X = by 

Px"(x) = ^^ (x€{ 1 , 2 ,-..,K„}), 

where is an arbitrary constant satisfying 0 < Sq < 1 and Kn = M^(l — Sn) 
(^Sn = — ). The source X satisfies the strong converse property because it 
satisfies H(X) = M(X) = R. We now set 
= T'- = {1,2 ,...,M,} 

and define both an encoder — > Mn and a decoder 'ipn • Mn 

as the identical mapping. Then, the error probability €n clearly satisfies Sn = 
0 {n = 1, 2, • • •). Note that all of the conditions in Theorem 2.6.4 are satisfied. 
Since setting = ipn{X'^) leads to Px^<^(x) = P^^(x) (Vx G X'^), it follows 
that 

d{x\UMj= \Pu^A^) - PxA^)\ 

> Y. -Px-(x)| 

xgai;, 

> \M' \ (— — 

= 1 - (5„ - (1 - £0) 

= Sq 

where = {1,2,-*-, Kn}. Therefore, we obtain 
lim inf d(X^, Um^, ) > ^0 > 0, 

n -^00 

which shows that (2.6.26) does not hold. □ 



2.6 Random Number Generation and Source Coding 163 



Here, we refer to an operational relationship between the fixed-length 
source coding problem and the resolvability problem in §2.2. 

In §2.2 the problem in which, given a source Y = we gen- 
erate from UMrt by transforming the uniform random number of 

size Mn is called the resolvability problem. Let (j)n : denote 

the transform in the resolvability problem and set where 

= {1, 2, • • • , Mn}- Then, we can define the subset Sq of by 

<So = {yeT"|Py4y)>0}- (2-6.27) 

Clearly, we have |«So| < Mn, which enables us to define the encoder ipn - 
y'^ UMr, that maps each element of Sq to a distinct element of and all 
elements of — <Sq to 1. In addition, by defining the decoder 'ijjn - 
as the inverse map of (pn\so, this becomes an (n, Mn, £n) -code for 

the source Y = satisfying 

en = Fr{Y^ ^So}- (2.6.28) 

Since Remark 2.1.1 implies 

Pr {f” e <So} - Pr {P” e 5o} < ld(y”,P”), 
by considering Pr G <So| = 1, it follows that 

Pr{P” ^5o} < ld(P”,y”). 

Consequently, we obtain 

en<\d{Y^,Y^) (2.6.29) 

from (2.6.28). In this way we can construct an (n, Mn, Sn)-code satisfying 
(2.6.29) that corresponds to a given transform (j)n for generation of the uni- 
form random number of size Mn- 

Conversely, suppose that an (n, Mn, ^n)-code for the source Y = 
is given. We can define the subset Sq of by 

>5o = {y G y" 1 y = i^nivniy))} , (2.6.30) 

where denotes the pair of the encoder and the decoder correspond- 

ing to the (n, Mn, ^n)-code. For an arbitrarily small 7 > 0 set M^ = MnC^^ 
and consider the uniform random number Um^ of size M^. Prom the same 
argument as the proof of Lemma 2.1.1 (Case 1)) using Um^i^o and = 
{1, 2, • ■ • , M^} instead of Tn(a) and Sn{a + 7), respectively, we can con- 
struct a transform (/>n : ^ satisfying 

^(Yn,yn) < 2 Pr { Y^ ^ 5o} + 2e-^^, (2.6.31) 

where Y'^ = (j)n{UM'^)’ Since Sn = PrjY’^ ^ 5 q}, (2.6.31) can be expressed 
as 



164 2 Random Number Generation 



d{Y^, yn) < -h 2e~^^. (2.6.32) 

In this way we can construct a transform = (j)n{UM!^) for generation of 
the uniform random number Um^ that satisfies (2.6.32) corresponding to a 
given (n, Mn,£n)-code for the source Y = 

In the argument above we have seen an operational relationship between 
the fixed-length source coding problem and the resolvability problem. We can 
see that considering the subsets <So defined by (2.6.27) and (2.6.30) provides 
a key in both cases to obtaining the resolvability problem corresponding to 
the source coding problem and the source coding problem corresponding to 
the resolvability problem. That is, via the subsets <So we can find a one-to- 
one correspondence between the resolvability problem and the source coding 
problem sharing the same rate. Equations (2.6.29) and (2.6.32) guarantee 
that, if the performance of the random number generation in the resolvability 
problem is good, the performance of the corresponding source coding is also 
good and vice versa. In particular, it is guaranteed from (2.6.29) and (2.6.32) 
that a rate R being ^-achievable as a source coding rate (Definition 1.6.1) is 
equivalent to R being 2s- achievable as a resolvability rate (Definition 2.4.1) 
in the corresponding resolvability problem. 



We conclude this section with description of a relationship between the 
variable-length coding problem (§1.7) and the variable-length intrinsic ran- 
domness problem (§2.5). Suppose that a general source X = is 

given. Consider an arbitrary variable-length encoder (pn : U* and a 

decoder 'ipri • where 7/ = {0, 1, • • • , R" — 1} denotes a code alphabet 

and we assume that is a prefix-free code. Here, we consider a code 

with the error probability equal to 0, i.e., satisfying 

- Pr {X- ^ V^,((^,(X-))} = 0 (Vn - 1, 2, . . •) 

as was so in §1.7. Since in this case there are one-to-one correspondences 
between the output from the source X'^ and the output from the encoder 
and and the output from the decoder X^ = 'ip^(cpn(X'^)), 
for all n — 1,2,- •• the information-spectra of X^, X^ and X'^ coincide 
(compare this fact with Theorem 2.6.3 in the case of the fixed-length coding). 

In the case of the variable-length coding as well, we have a theorem corre- 
sponding to Theorem 2.6.4 for the fixed-length coding. That is, if the entropy- 
rate of a source X — has the limit 

lim (2.6.33) 

n-^oo n 

the optimal variable-length code (pn with the infimum achievable variable- 
length coding rate Rv(X) (Definition 1.2.2) can be viewed as a random num- 
ber generator that generates the variable-length uniform random number with 
rate Rv{X.) (§2.5) by transforming the source X. This claim can be found in 



2.6 Random Number Generation and Source Coding 165 



the following theorem that corresponds to Theorem 2.6.4 for the fixed- length 
coding. Here, we use the normalized conditional divergence distance as a 
measure of the probability distribution approximation (see Definition 2.5.6). 

Theorem 2.6.5 (Han and Uchida [44]). Let X = be a source with 

the limit (2.6.33). Consider an arbitrary variable-length encoder (fn • 

W (prefix code) satisfying 

lim lEl(/?n(X")| =ii^(X). (2.6.34) 

n—^oo n 

Define the random variable taking values in = {x G Af” | |(^„(x)| = m} 
by 

and let In be the random variable satisfying In = m if X'^ G Vm- Then, it 
holds that 

lim Vn) = 0 (2.6.36) 

n-^oo n 

(see (2.5.28)). 

Remark 2.6.4. Consider the situation that the condition (2.6.34) holds. If 
(2.6.36) holds, then Theorem 2.5.3 tells us that 

liminf 1 e|v>„(X")| < 5^(X) = liminf 1 fk(^”)- 

n-^oo n n—*-oo n 

In addition. Theorem 1.7.1 claims that 

limsuplE|<p„(X”)| > Ry{X) = \imsup- H k{X^). 

n-^oo n— »oo ^ 

On the other hand, since we have 

limsuplE|</?„(X”)| = liminf 1 e|(^„(X")| 

n — >oo n >^oo n 

from (2.6.34), it follows that 

liminf liJ/f(X") > limsup 1 fk(^")- 

71 >^oo 77/ n — >oo ^ 

That is, lim —Hk{X'^) must exist. Therefore, this, together with Theo- 

n— >oo 77 , 

rem 2.6.5, implies that the existence of lim —Hk(X'^) is a necessary and 

ri-^oo n 

sufficient condition of (2.6.36) under the condition (2.6.34) (compare this re- 
mark with Remark 2.6.2). □ 



166 2 Random Number Generation 



Proof of Theorem 2.6.5. 

Set = {m I Pr {/„ = m} > 0}. By noticing 

Pr{c/(-)=u} = ^ (VueW™) 

for each m G v7(<^n), the divergence D{ipn{X^)\\U^'^'>) can be expressed as 

= 5] Pr{y„(X») = u}Iog^ L 

= m - FK(</^n(X” )) 

= m - Fic(x;;,), 

where the last equality follows from the fact that (pn is one-to-one. Then, it 
follows that 

n 

= - Pr{/n = m}i)(^„TO||C/(-)) 

n ^ 

meJi^Pri) 

= - T mPr{/„=m}-l T = m} Hk^ 

n n ^ ^ 

meJi^n) m^J{Lpn) 

= 1e|^„(X”)| - -HK{X^\In) 
n n 

= 1e|^„(X”)| - -Hk{X^) + -HK{In). (2.6.37) 

n n n 

By using the inequality (cf. Csiszar and Korner [19]) 

Hk{Iu) < log;^[e(E(4))] 

= log^[e{E\^r^{Xn\)]. 

(2.6.37) is evaluated as 

Pn < ^E|v^„(X")| - ^Hk(X^) + ^ log^[e(E|^„(X")|)]. (2.6.38) 

We notice here that (2.6.34) implies that the third term on the right-hand 
side of (2.6.38) satisfies 

lim 1 log;^[e(E|^„(X”)l)] = 0. (2.6.39) 

n—^oo Ti 

On the other hand, since Theorem 1.7.1 implies 
R,{X) = lim 1 fx(X"), 

n^oo Ti 

(2.6.34) and (2.6.38)-(2.6.40) lead to 



(2.6.40) 



2.6 Random Number Generation and Source Coding 167 



limsuppn < lini —E\ipn{X'^)\ - lim —Hk{X'^) 

n-^oo n— ^oo n n-^oo n 

= Ry{X)-R,{X) = 0, 

which means lim = 0. □ 

n^oo 



Remark 2.6.5. If the normalized divergence distance (2.6.36) is replaced 
with 

lim V Pr {/„ = m} ), = 0, (2.6.41) 

n^oo 

m 

the expectation of the variational distance, Theorem 2.6.5 does not always 
hold. This fact clearly follows by considering the source X = and 

the fixed- length coding given in Remark 2.6.3. □ 



Remark 2.6.6. Even in the case that the limit (2.6.33) does not exist, the 
claim (2.6.36) in Theorem 2.6.5 still holds if we replace the condition (2.6.34) 
with 

lim - |E|<^„(X”)| - Hk{X'^)\ = 0. (2.6.42) 

n— »oo U 

This fact is easily verified by checking the proof of Theorem 2.6.5 carefully. In 
this case, the generating rate of the variable-length uniform random number 
(/?^(X^) is equal to 

liminf-i?x(^")- 

n-^oo n 

That is, such ipn achieves the optimal variable- length uniform random num- 
ber generating rate 5'*(X). □ 



Remark 2.6.7. Visweswariah, Kulkarni and Verdu [94] considered the prob- 
lem using 

lim max — £»(v?„(X")l|[/("*h = 0 (2.6.43) 

n— >oo m^Gn TTl 

instead of (2.6.36) as a measure of the probability distribution approxi- 
mation and showed a theorem corresponding to Remark 2.6.6 for the case 
that rY is a finite source alphabet. Here, Gn is a finite subset of JijPn) = 
{m I Ft {\(pn{X'^)\ = m} > 0} satisfying lim Ft {In G Gn] = 1. Note that 

n— >oo 

we cannot judge in general which of the conditions (2.6.36) and (2.6.43) is 
weaker because Gn — does not always hold. □ 



168 2 Random Number Generation 



Remark 2.6.8. We can obtain a theorem corresponding to Theorem 2.6.5 
on the weak variable- length coding described in §1.8. That is, for the case 

that the limit lim exists for an arbitrary and sufficiently small 

n-^oo 77, ^ ^ 

0 < ^ < 1, the claim (2.6.36) in Theorem 2.6.5 holds for any weak variable- 
length encoder (pn satisfying 

lim 1 e|v?„(X”)| = i?*(X) and lim £„ = 0 

n—^oo u n^oo 

(the first equation means that the weak variable-length code has the optimal 
rate R*(X)), where Sn denotes the error probability of the weak variable- 
length code. To establish this fact, we just take (pn, being the weak variable- 
length encoder, into consideration and evaluate ^tgain in the 

proof of Theorem 2.6.5. □ 



3 Channel Coding 



3.1 Channel Coding: Stationary Memoryless Channel 

Letting X be an input alphabet and y an output alphabet^ we arbitrarily fix 
a collection of conditional probabilities W — {W{y\x)}^^p^ called the 
transition probabilities of a channel satisfying the condition 

Y^W{y\x) = l (VxeA”). 
vey 

We often use the notation W : X y. In this section we assume for simplicity 
that both X and 3^ are finite sets. 

We call a channel that outputs an output sequence 

y = {yi,y 2 ,--- ,yn) 

corresponding to a given input sequence 
X = (xi,X2, • ■ • ,a;„) G A'" 

with the conditional probability W’^(y|x) defined by 

n 

lF"(y|x) = Y[W{yi\xi) 

i=l 

a stationary memoryless channel W and denote it by W = {W}. We trans- 
mit information through this channel in the following way. Denoting by 
Mn = a message set, we call a mapping cpn : Mn X^ 

that transforms a message into a channel input of length n the encoding 
function (encoder). Here, = ^n{i) is called the codeword for a message i 
and Cn = {ui, U 2 , • • • , um^, } the code. When a transmitter wants to send the 
message i, the encoder (p^ injects the codeword of i into a channel. On the 
other hand, a receiver who receives an output y from the channel partitions 
y^ into disjoint subsets 

3^^ - Pi U . . • U Pm. (A n P,- - 0 for i ^ j) 

in advance and judges that i G Mn is transmitted if y G P^ (such Vi is 
called the decoding region of message i). We call this operation decoding and 



170 3 Channel Coding 




encoder channel decoder 



i C Mn 



Fig. 3.1. 



the mapping 'ijjn : 3^^ — > Mn expressing this operation a decoding function 
{decoder) (Fig. 3.1). 

We now define the coding rate (or transmission rate) by 

Tn = -logMn. 
n 

The coding rate means the amount of information transmitted in one trans- 
mission. We also define the error probability Sn in channel coding by 






i=l 



(3.1.1) 



Notice here that Sn is defined as the average of the error probability under 
the assumption that every message i G Mn is equally likely to be generated. 
We call the code Cn — {ui,U 2 , • • • with the message set Mn of size 

Mn and the error probability £n the (n, en)-code. 

Problems of channel coding are usually formulated as maximization of 
the coding rate subject to the condition that there exists a pair of an encoder 
and a decoder {(pn^i^n) with the error probability less than a value given in 
advance. We first consider the case that the error probability is required to 
satisfy £n 0 a,s n oo. We give the following definitions: 

Definition 3.1.1. 

Rate R is achievable There exists an (n, en)-code satisfying 

lim £n = ^ and lim inf — log Mn > R. 

n^oo n—^oo n 



Definition 3.1.2 (Channel capacity). 

C(W) — sup {R\ R is achievable} . 



For random variables X and Y we define I{X\Y) by 



PxY{x,y)\og 

xexyey 



PY\x{y\x) 

PY{y) 



(3.1.2) 



and call I{X;Y) the mutual information between X and Y, where PY\x{y\x) 
represents the conditional probability ofY = y given X = x. 

Then, we have the following fundamental theorem. 



3.1 Channel Coding: Stationary Memoryless Channel 171 



Theorem 3.1.1 (Shannon [77]). The channel capacity C (W) of a stationary 
memoryless channel W = {W} with finite input and output alphabets is given 
by 

C(W) = C(W) = nmxI(X;Y), 

where PxYi^iy) = Px{^)W{y\x) {Y is the output variable of the channel 
W with an input variable X) and max denotes the maximum with respect 

to all random variables X over the input alphabet X. 

Proof 

1) Direct part: 

We prove that R = I{X]Y) — 2^ is achievable for an arbitrary input 
variable X, where 7 > 0 is an arbitrarily small constant. Let X'^Y'^ = 
(Xili, • • • , be stationary independent copies of XY {X'^Y'^ is also 

simply called an i.i.d. sequence) and set X^ = (Xi,X2, • • • ,X^) and Y'^ = 
(Yi,Y 2 r-,Yn). Define 



where elements of are called the typical sequences in channel coding. On 
the other hand, we clearly have 



Note here that each term in the sum on the right-hand side is independent 
and subject to an identical probability distribution with mean /(X;y). Fur- 
thermore, the variance of each term is independent of i and uniformly upper 
bounded, not relying on VF(-|-) and Px{-),Py{') (see Remark 3.1.1 below). 
By taking these properties into consideration, Chebyshev’s inequality tells us 
that 



We now set — e^^ = ^ri(i{X;Y)- 2 j) define the code as Cn = 
{ui, U2, • • • , where ui, U2 , • • • , um^, C are independently generated 

subject to the probability distribution (such a code is called a random 
code). We define decoding as follows. Let y G be an output that a receiver 
receives. If there exists a unique (z G Adn) satisfying 





(3.1.4) 



Pr{X”y" € T„} ^ 1 asn-^ 00 . 



(3.1.5) 



(ui,y) € Tn, 



the decoder is defined to output i equal to the index of u^. If there exists 
no such Ui or exist more than one such u^, the receiver judges that an error 
occurred. Then, if we define the event Ei by 



172 3 Channel Coding 



Ei = {{ui,y) eTn} (ieMn), 

the expectation of the error probability of a message iq e Mn with respect 
to the random code which is denoted by can be evaluated as 



Sn(io) = Pr i U IJ Si 






<Pr{S^}+Pri ySi 



^i^io 

<Pr{S“J + ^Pr{Si}, 

ij^io 

where y is interpreted as an output corresponding to the input We 
notice here that £n(^o) docs not depend on io C Mn due to the symmetry of 
the random code. Thus, Sn defined as the average of Sni'^o) with respect to 
io G Mn can be written as 






lQ = l 



Therefore, letting y denote an output corresponding to the input ui, we have 

£„ <Pr{SJ} + ^Pr{Si}. (3.1.6) 

27^1 



Hereafter, we evaluate the right-hand side of (3.1.6). In view of (3.1.5) we 
first obtain 



Pr {E{} = Pr {X^Y^ ^ T^} 0 as n ^ oo. (3.1.7) 

We notice that with i ^ 1 and y are independently subject to Px^ and 
Pyr,. , respectively. This allows us to write Pr{P^} in the sum in (3.1.6) as 

Pr{Si}= ^ Px"(x)Py.(y). 

(x,y)GT,. 

Since we have 

Py-(y) < 

for (x,y) G Tn, it follows that 

Pr{Si}< ^ Px..(x)Py„|xn(y|x)e-"(^(^’^)-T') 

(x,y)€T„. 

^ ^-n(/(X;y)-7)^ 



Therefore, we obtain 



3.1 Channel Coding: Stationary Memoryless Channel 173 



In < Pr{X”y" ^ Tn} + (M„ - 
< Pr {X"y" i Tn} + 

= Pr{X”y" ^ Tn} + e~^'^, 

which, together with (3.1.7), yields 

lim 6n = 0. (3.1.8) 

n—^oo 

Since Sn is defined as the expectation with respect to the random code, (3.1.8) 
guarantees the existence of at least one deterministic (n, £n)-code with 
the error probability €n satisfying lim £n = 0. Notice here that 

n—^oo 

lim inf 1 log M„ > i? = 7(X; Y) - 27 

n-^oo Ti 

trivially holds. This shows that R = I{X]Y) — 2y is achievable. Since the in- 
put variable X is arbitrary and 7 > 0 can be arbitrarily small, this completes 

the proof that any R with R < max J(X;y) is achievable. 

X 

2) Converse part: 

We prove this part by using the Fano inequality (Lemma 1 . 1 . 1 ). If a 
rate R is achievable, there exists an (n, Mn,6n)-code Cn = {ui, U 2 , • • • , } 

(u^ G satisfying 

lim £n = d and liminf — logM^ > R. (3.1.9) 

n— >00 n^cx) 77, 

Let X'^ be the random variable uniformly distributed over the code Cn = 
{ui, U 2 , • ■ ■ ? and denote by the channel output corresponding to 

the input X'^. If we introduce a new random variable X'^ such that X'^ = 
if 7 = ' 0 ^(y^), the error probability Sn can be written as 

We first notice that 

logM„ = F(X") 

7(X”;X") +77(X”|X”) 

< J(X”;y") + F(X”|X"), 

where the inequality follows because X" — > y” — > X” forms a Markov chain 
and hence 7(X”; X”) < 7(X"; y") due to the information processing inequal- 
ity (cf. Cover and Thomas [17]). In addition, the Fano inequality implies 

77(X"|X") < £„logM„ + h{en). 

Therefore, it follows that 

log Mn < /(X"; y") + Sn log Mn + /l(£„), 

i.e., 



174 3 Channel Coding 



logM„ < 



7(X";y") + /i(g„) 
1 



(3.1.10) 



By noting £„ — > 0 and h{en) — > 0 as n — > oo, (3.1.9) implies 



liminf llogM„ < liminf 1/(X”; r”). (3.1.11) 

n-^co 77, n—^oo fl 

On the other hand, by setting X'^ = (Xi, X 2 , • • • , X^), Y'^ = (Fi, F 27 • * ' , 
and = (Yi, I 25 • • • 5 ^- 1 ) and using 



7f(r”) < ^77(ri), 

7=1 



77(y”|x") = ^/7(y|x”y*-i), 

7=1 

the assumption that the channel is stationary and memoryless leads to 



7(X”;y”) = i7(y") - 77(y”|X”) 

n n 

< Y^HiYi) - ^77(y|x"y*-i) 

7=1 7=1 

= y^77(y)-f^77(y|Xi) 

7=1 7=1 

n 

= YHXi-,Yi) <nmaxI{X-,Y). 

7=1 

Therefore, we have R < max/(X;Y) from (3.1.11). That is, any achievable 

X 

rate R cannot be greater than max/(X; Y). □ 



Remark 3.1.1 (Uniformly bounded variance). Suppose that either in- 
put alphabet X or output alphabet y is finite. We write each term in the 
sum in (3.1.4) as 



Z = log 



W(Y|X) 
Py{Y) ’ 



(3.1.12) 



where the index “ 2 ” is omitted for simplicity. First, we express the expectation 
E(Z) of Z (that is, the mutual information between X and Y) as 



E(Z) = /(X; Y) = H{Y) - H{Y\X). 



Since without loss of generality we may assume that the output alphabet y 
is finite and so H{Y\X) > 0, we obtain 

0 < E(Z) < HiX) < log 1T| 



(3.1.13) 



3.1 Channel Coding: Stationary Memoryless Channel 



175 



(cf. Cover and Thomas [17]). This means that E(Z) is dependent on neither 
W{’\-) nor Py(-) and is uniformly bounded. Next, we express the variance 
V{Z) of Z as 

V{Z) = E(Z - E(Z)f = E(Z^) - (E(Z)f 

and show that V(Z) is uniformly bounded. The equation above indicates 
that it suffices to show that E(Z^) is uniformly bounded for proving that 
V(Z) is uniformly bounded. To this end, noticing that 0 < Py(y) < 1 and 
0 < W{y\x) < 1, we evaluate E(Z^) in the following way: 



E(z^) = 

x6Xyey 



W{y\x) 

Priy) 



E 



Pxy{x,v) ( log 



{x,y)eXxy-.W{y\x)<PY{y) 



W{y\x) 

Py{y) 



+ 



E 



PxY{x,y) ( log 



(x,v)€Xxy:W{y\x)>PY(y) 



W{y\x) 

PY{y) 



E 



PxY{x,y) ( log 



(x,y)e?(^xy:W(y\x)<PY{y) 



PY{y) 



W{y\x) 



+ E PxY{x,y) [log 

{x,y)e^^ xy:W{y\x)>PY (y) 



< 



E 



PxY{x,y) ( log 



(x,y)eP(^xy:W{y\x)<PY{y) 



1 



W{y\x) 
PY{y) 
2 



W{y\x) 



+ 



E 



PxY[x,y) ( log 



{x,y)eXxy:W{y\x)>PY{y) 
(x,y)€Xxy 



Pyiy) 



W{y\x) 



+ Y PxY{x,y)( log 



{x,y)e^xy 



PY[y) 



- YPx{x)Y^(y\^) 



xe^: 



y^y 



W{y\x) 






yey 



Py(y) 



(3.1.14) 



Now, define a function g{u) by 



g{u) = u log 



(0 < u < 1). 



176 3 Channel Coding 



By setting t — log — , glu) can be written as 
u 

g{u) = k{t) = t^e~^ {0 <t < +oo). 



We can verify that k{t) reaches the maximum at t = 2 and therefore 

4 

max q(u) — max k(t) = (3.1.15) 

o<u<i 0<t<-foo 



By applying (3.1.15) to the right-hand side of (3.1.14), we obtain 



E(Z^) < 






which means that the variance V(Z) depends on neither nor Py(-) 

and is uniformly bounded. □ 



3.2 Coding for General Channel 

In this section we consider coding of a general channel not restricted to 
stationary memoryless channels. To this end, we first define a general channel 
with input alphabet X and output alphabet y in the following way. Let 
W'^ = be an arbitrary conditional probability distribution satisfying 

W""(y|x) = 1 (Vx e X'^) (3.2.1) 

for each n = 1, 2, • • •. We call the sequence W = a general channel 

We often use the notation : X'^ so as to explicitly write an 

input alphabet and an output alphabet. All of the results given hereafter 
throughout this book, excluding results clearly mentioned otherwise, hold for 
cases where X and y are any abstract sets, including countably infinite or 
continuous alphabets. 

Remark 3.2.1. In the case where either the input alphabet X or the 
output alphabet y is abstract in general, it is understood that Px^(x), 
kL^(y|x) denote the corresponding (conditional) probability mea- 
sures Pxr> {dx.), Pyr' idy), W^(dy|x), respectively, with the integral / in place 
of the summation This convention requires an underlying adequate prob- 
ability space. However, we omit formal subtle definitions of such a probability 
space so as to avoid complication since the probability space is naturally de- 
termined from the context. The readers interested in the technical details 
should see, for example. Gray [31] and Dobrushin [23]. □ 



3.2 Coding for General Channel 



177 



Notice that we consider quite a wide class of channels including all nonsta- 
tionary or nonergodic channels since channels we treat are only required to 
satisfy (3.2.1). In addition, this class of channels includes a channel with any 
kind of memory structure. 

Next, we consider a general input process* X where X'^ is an 

arbitrary random variable taking values in Then, we define the output 
process Y = of a channel W corresponding to X by 

Px"y"(x,y) = Px"(x)VF”(y|x), 

where we note that Y'^ is a random variable taking values in y^. In this gen- 
eral channel coding problem, the following quantity plays a key role instead 
of the ordinary mutual information. 



Definition 3.2.1. 



/(X; Y) — p- lim inf — log 

n—^oo Ti 



w”(y”|X”) 

PyAY^) 



Here, we call 
1 , 

n Pyn(y") 

on the right-hand side ^ the mutual information density rate of (X, Y) and its 
probability distribution the mutual information spectrum (or, more generally, 
the information- spectrum) of (X, Y). In addition, we call /(X; Y) the spectral 
inf-mutual information rate. 

A basic property on the mutual information density rate is obtained from 
the following lemma. 



Lemma 3.2.1. Let {Un} and {Vn\ he arbitrary sequences of random vari- 
ables taking values in a source alphabets {Z^}. Let j > 0 be an arbitrary 
constant. Then, for all n = 1,2, ‘ • it holds that 



Pr 



1 PuM 

n °^Pv„{Un) 




< 6-”'^. 



Proof. Define 



Bn = 



^ ^ I 1 , Pu„,{z) ^ 

z G - log < -7 

n Py, (z) 



* Throughout this book, depending on the context we indifferently use the term 
“process” or “sequence” to denote the general source (not necessarily satisfying 
the consistency) as defined in Chapter 1. 

^ Let /n(y|x) = ^yr^^dy) fi^^^ote the Radon-Nikodym derivative between two 

with values on a singular set assumed conventionally 
- is defined to be fn{Y^\X'^), which is obviously a 



Pyr,.(dy) 

probability measures on y 
to be + 00 . Then, 



PYn.{yr^-) 

random variable (See Remark 3.2.1). 



178 3 Channel Coding 



By noticing Pur^Xz) < Pv^ (^)e for z e Bn, it follows that 
PuAUnJ 

Z^Bn 



Pr 



f 1 , PuAUn) ^ 1 V- n / ^ 

- log \ ^ ' ■ < -7 ^ = V PuAz) 
(n PvAUn) ) 



< Y1 Pv^^^y 

zGBn 

< e~^^. 



-nj 



Setting Zn = x Pu^X^^y) = -Px-y-(x,y) and Pv.(x,y) - Px-(x) 
Pyr,.(y) in Lemma 3.2.1 and recalling Px^y^(x, y) = Px^(x)hF’^(y|x) lead 
to 



Pr 




W^{Y^\X^) 

PyrXy^) 




< 



(3.2.2) 



forn = 1 , 2 , • • •, where 7 > 0 is an arbitrary constant. Therefore, in view of the 
definition of /(X; Y), it turns out that the spectral inf-mutual information 
rate is always nonnegative. That is, 



7(X;Y) >0. 



(3.2.3) 



We define the notions of the code, encoding, decoding, the coding rate, the 
error probability and the channel capacity in the same way as the preceding 
section (cf. Definition 3.1.1 and Definition 3.1.2). Then, we have the following 
quite general theorem. 



Theorem 3 . 2.1 (Verdu and Han [91]). For any channel W with arbitrary 
abstract input and output alphabets Y and T, the channel capacity C(W) is 
given by 

C(W) =supi;(X;Y), (3.2.4) 

X 

where sup denotes the supremum with respect to all the input processes X. □ 

X 



Remark 3.2.2. In fact, sup on the right-hand side of (3.2.4) can be replaced 

X 

with max. This is because X can be any general source. □ 



We introduce here the following notation which is used not only in this 
chapter but also in the following chapters. Letting Z and V be given two ran- 
dom variables, we denote by Pz\v ^he conditional probability distribution of 
Z given V. In other words, P^|y(z|u) — V\{^Z — zjV — v}. 

Proof of Theorem 3.2.1. 

1 ) Direct part: 



3.2 Coding for General Channel 



179 



It suffices to show that R = Z(X; Y) — 27 is achievable for any input 
process X, where 7 > 0 is an arbitrarily small constant. We can develop 
this part by using the same argument used in the proof of the direct part 
of Theorem 3.1.1 described in the preceding section. Here, we replace in 
(3.1.3) with 



Tn 



|(x,y) € A’” X j;" 



-log 

n 



l^"(y|x) 

Py„(y) 



>I(X;Y )-7 



} 



and the mutual information I{X\Y) with /(X; Y). In this setting the prop- 
erty (3.1.5) is obtained directly from the definition of /(X; Y) without using 
Chebyshev’s inequality. 



2) Converse part: 

Suppose that an (n, Mn,£n)-^ode satisfying lim inf — log and 

n—^00 Tl 

lim £n = 0 is given. Denote this code by Cn = {ui,U 2 , • • • where 

n— »oo 

Ui G When we treat general channels not restricted to stationary memo- 
ryless channels treated in the preceding section, the Fano inequality does not 
give a sufficiently tight upper bound on the achievable rates. Therefore, we 
need to develop another inequality given in the following lemma completely 
different from the Fano inequality. This lemma corresponds to Lemma 1.3.2 
on source coding (we will generalize this lemma to Lemma 3.8.2 in §3.8). 

Lemma 3 . 2.2 (Verdu and Han [91]). Let X'^ be the random variable uni- 
formly distributed over an (n, Mn^Sn)-code Cn and the output of a channel 
W = with X'^ as the input Then, for each n = 1 , 2 , • • • it holds 

that 

where ^ > 0 is an arbitrary constant 



Proof Set P = e By using the relation 



1 . w^”(y|x) 

n ® PyAy) 



= - log 
n 



Xr>-\Y 



"(x|y) 



Px*.(x) 



and noticing Pxr>-{K) = we have 



1 1 

n Pyr,. (y) n n ' 

Therefore, the first term on the right-hand side of (3.2.5) can be written as 
Pr {Px.,y. </?}. 



By setting 



180 3 Channel Coding 



Ln = {(x,y) eX^ xy^\ Px"|y-(xly) < f3} , 

(3.2.5) can be expressed as 

Px^'Y^ {Ln) < + P- (3.2.6) 

In order to develop this inequality, we express the code Cn as Cn = {ui, U2, • • • , 
umn } ^ = I5 2, • • • , Mn denote by T>i the disjoint decoding region 

corresponding to u^. Set 

Bi = [y ey^\ Px"|y” (ui|y) < /?} • 

Then, (3.2.6) is established in the following way: 



Mr, 



Mr, 



Px^>-Y^{Ln) = y^Px"y"[(uj,^i)] 

i=l 

Mr, 

i=l i=l 

Mr, Mr, 

< E Px^Y’^ [(ui, V^)] + ^ Px.y. [(u„ B, n A)1 

i=l i=l 

. Mr, Mr, 

^ i=i i=i 

Mr,. 

< 'Yl Px^ Y^ [(Ui, Bi n Vi)] 



Mr, 



-« + E E ■Py"(y)-Px"|y"(ui|y) 

i=l yeBidVi 



Mr, 



<£„+/?E E 

i=i y^is^nVi 

Mr, 

< En P P Yj P P- 



Now, we return to the proof of the converse part. We show that the 

assumption that R = sup /(X; Y) + 37 is achievable leads to a contradiction, 
X 

where 7 > 0 is an arbitrarily small constant. Since R is assumed to be 
achievable, there exists an (n, M^, £:n)-code satisfying 

liminf — logM^ > R = sup/(X; Y) + 37 (3.2.7) 

n— >00 n X 



and lim = 0. If we define X'^ as the input variable uniformly distributed 

n— >00 

over the code and the output corresponding to Lemma 3.2.2 implies 



3.2 Coding for General Channel 



181 



£n > Pr 



-log 

n 



PyAY-) 



< -logM„ -7 
n 



— e 



— 717 



On the other hand, since (3.2.7) yields 



- log Mn > sup/(X; Y) +27 (Vn > no), 
n X 



it holds that 



(3.2.8) 



However, while e > 0 as n — ^ 00 , the definition of /(X; Y) implies the 
existence of infinitely many n’s satisfying 

This yields a contradiction because (3.2.8) shows that Sn cannot satisfy 
^ 0 as n ^ 00 . Consequently, any achievable rate R cannot be greater 
than sup 7(X;Y). □ 

X 



Example 3.2.1 (Verdu and Han [91]). Let us consider the following example 
as an application of Theorem 3.2.1. Let the input alphabet and the output 
alphabet be Y = T = {0, 1}. We denote by 

an arbitrary general source (noise process). For an input process 
X = {x” = . • • ■ , ^ 

we define the output process Y = {Y'^ = {Y^^\Y 2 ^\ ‘ of a 

channel W = corresponding to X by 

0 (exclusive OR), 

where Z = {0, 1} and Z is assumed to be independent of X (we call such 
a channel the additive channel). Then, the channel capacity C(W) of this 
channel is given by 

C(W) - sup/(X; Y) - log 2 - H^(Z). (3.2.9) 

To develop this, we use the following two inequalities: 

/(X;Y)<:H(Y)-F(Y|X), 

7(X;Y) >H(Y)-H(Y1X), 



where 



(3.2.10) 

(3.2.11) 



182 3 Channel Coding 



F(YlX) = p-li 




These inequalities can be easily verified from the basic properties on the limit 
superior /inferior in probability given in §1.3. We first obtain 

/(X;Y)<log2-:^(YlX) 

from (3.2.10) since H{Y) < log|T| as is proved in Theorem 1.7.2 in Chap- 
ter 1. We notice here that iJ(YlX) = H{Z) due to the assumption that Z is 
independent of X. Hence, we obtain here 



for any X. On the other hand, if we define X as the stationary memoryless 
process taking 0 or 1 with probability 1/2, we have H_{Y) = log 2. This, 
together with (3.2.11), yields 



We now obtain (3.2.9) from the combination of (3.2.12) and (3.2.13). This 
formula (3.2.9) of the channel capacity can be regarded as a generalization 
of Parthasarathy’s formula [75] for the case that Z is a stationary process. 
In particular, the formula contains C(W) = log 2 — h{p), i.e., the formula of 
the channel capacity for the stationary memoryless binary symmetric chan- 
nel with the crossover probability p, as a special case. Here, the stationary 
memoryless binary symmetric channel W with crossover probability p means 
the additive channel with the stationary memoryless noise process Z = {Z} 
with Pz(l) — P and satisfies H{Z) = h{p). □ 

Now, let us consider a way of obtaining Theorem 3.1.1 concerning station- 
ary memoryless channels from Theorem 3.2.1. Since Theorem 3.2.1 is valid for 
an arbitrary general channel, we can surely obtain Theorem 3.1.1 from The- 
orem 3.2.1 as a special case. However, simple computation does not reduce 
Theorem 3.1.1 to Theorem 3.2.1. This is because Theorem 3.2.1 describes the 
channel capacities of general channels by using a quantity completely different 
from the mutual information. Hence, it is meaningful to obtain Theorem 3.1.1 
from Theorem 3.2.1 by direct computation. Such a computation enables us 
to clarify the logic connecting results on a general channel with results on 
particular channels. Before starting the computation, we need some prepara- 
tion. 

Let W be the stationary memoryless channel specified by a W : Y ^ 



y. That is, W = {PP}- Let == {x[^\ be an arbitrary 

input variable and , ¥ 2 ^^ , • • • ? ) f Le output variable of W 

corresponding to X” = ■ ■,xt^). Set X = and Y = 

Since the channel is memoryless by assumption, we clearly have 



7(X;Y)<log2-i?(Z) 



(3.2.12) 



I(X;Y)>log2-77(Z). 



(3.2.13) 



3.2 Coding for General Channel 



183 



-Pr"|X"(y|x) = Y[W{yi\xi), 

i=l 

where x = (xi, • • • , Xn) and y == (yi, • • • , yn)^ Define 




by 

(x) = P^(n) (Xl) • • • P^(.) (x^) 

and denote by Y {Y^ = ' 5 ^be channel output 

corresponding to X. Then, we have the following lemma. This lemma can 
be regarded as an information-spectrum version of the mutual information 
inequality 

n 

= 7(X";F") > 7(X”;y") (3.2.14) 

i=l 

for a memoryless channel W = {W}. 

Lemma 3.2.3 (Verdii and Han [91]). For any memoryless (but not neces- 
sarily stationary) channel W = {Wi, W 2 , • • •}; it holds that 

/(X;Y)>/(X;Y), 

where either input alphabet X or output alphabet y is assumed to be finite. 
Proof. Define 

1 w-(Y-\x-) 1 

n Pyn(yn) nL g p^(„)(yy)) 

and set 

a = p- lim inf Zn . 

n^oo 

We will show that a > /(X; Y) and a < /(X; Y). 



1) a > 7(X;Y): 
We express Zn as 






n 

-Eioe 

n. 



(y,(n)) 






1 , W^CV^IX^) 1 , 

= - log ■ ^ (^n\ + ~ 

n PyniY^) n 



Py-(Y^) 

nPy(„)(yy)) 



184 3 Channel Coding 



By replacing Un and Vn in Lemma 3.2.1 with and Y , respectively, and 
recalling that 7 > 0 is arbitrary, we obtain 



p- lim inf — log 

n^oo Tl 



2=1 



p- lim inf — log 

n— >00 Tl 



Py-jY^) 

Py<y^) 



> 0 . 



Therefore, it follows that 



a = p- lim inf Zn 

n—^00 

^ . .1. W^{Y^\X^) , . , 1 , PyAY^) 

> p- lim inf — log — „ ■ h p- lim inf — log , . 

n-^oo Ti PynlY^^ n^oo Ti TT^ TD , . 



n”=iPvoo(rr) 



, . 1 , iy”(y”|X") 

> p- hm inf - log , T - 

n^oo n Pyn[Y^) 

= I(X;Y). 



2) a </(X;Y): 
Clearly, 



-log 

n 




1 

n 



^log 

2=1 



Py^roYr^) 



We notice here that, since the terms on the sum on the right-hand side are 
independent and either the input alphabet or the output alphabet is finite, 
all of their variances are uniformly bounded (see Remark 3.1.1). Hence, by 
applying Chebyshev’s inequality, we obtain 



n-.oo n Py"(Y ) 



n ( 

lim inf -V e nog 

n^oo n I 

i=l 



wdY^:'\xf ) 



n^oo n ^ 
2=1 

' y!" Yv' 



In addition, recalling P— (ro— (n.) = P^(r,.)y(n) leads to 



/(X; Y) = liminf 1 

n^oo n ^ ' 
2=1 

= liminf E(Z^). 



(3.2.15) 



(3.2.16) 



Therefore, we have only to establish that the right-hand side of (3.2.16) is 
greater than or equal to a. To this end, assume that the opposite is true. 



3.2 Coding for General Channel 



185 



That is, assume that /(X; Y) < a. Then, since /(X; Y) + 7 < a for some 
7 > 0, it holds that 

Pr {Zn > I(X; Y) + 7} ^ 1 as n 00. (3.2.17) 

On the other hand, by using E(Z^) > 0 and (3.2.16), we have 

E{Zn) > E{Znl[Zn < 0 ]) 

+ (liminf E(Z^) + 7)Pr {Z^ > /(X; Y) + 7} , (3.2.18) 

n— >cxD ^ ^ 

where 1[ • ] denotes the characteristic function. We now make use of the 
following lemma. 



Lemma 3.2.4 (Pinsker [76]). Let {Un} and {Vn} be arbitrary sequences of 
random variables taking values in source alphabets {2„}. Then, for all n = 
1, 2, • • ■ we have 



E 




Pu„{Un) 

PvAUn) 



log 



PuAUn) 

PvAUn) 



< 0 



} 



> 




(3.2.19) 



Proof. Setting 
I \ PUr, 

it follows that 



■{‘■■aS'hftS”!! 



PvAUn) 

= E PuAz)^og g{z) 

z:g{z)<l 

= E Pvr,{z)9{z)logg{z). 



z:g{z)<l 

Then, we obtain 
r.f, PuAUn). 

^r^pM- 



log 



PuAUn 



PvAUn 



y<o]}> E PvM\ioz\ 



z'g{z)<i 

> -log- 

e e 



since x log x > e ^ log e ^ for 0 < x < 1. 



□ 



If we use Lemma 3.2.4, setting Pu^ = W^{-\x) and Py^ = Py^ i it follows 
that 



186 3 Channel Coding 



nE{Znl[Zn < 0]) = PxA^MnZnl[nZn < 0]|X" = x) 

> Y. PxA^)\\og-^ 

= -logi. (3.2.20) 

e e 

Then, substitution of (3.2.20) into (3.2.18) yields 

E(Z„) > — log 1 + (lim inf E(Z„) + 7 )Pr > 7(X; Y) + 7 } , 

ne e n^oo 

which contradicts (3.2.17) if we take lim inf of both sides. □ 

n—^oo 

By virtue of Lemma 3.2.3, it turns out that sup in Theorem 3.2.1 can be 

X _ 

taken with respect to independent input processes X = satisfying 

%"(x) = -Px'") 

Then, as is proved in the proof of Lemma 3.2.3, Chebyshev’s inequality yields 
I(X;Y) = liminf 

n-^oo 77, ^ ' 

2=1 

By setting C{W) = sup/(X;y), we obtain 
X 

7(X;Y) < C{W). 

Next, we define X and Y as the random variables that attain C{W) = sup 

X 

I{X;Y) and and Y~^ the random variables determined by 

n 

Px-yn(x,y) = Y[PxY(xi,yi). (3.2.21) 

2=1 

Setting X = and Y - we obtain 

I{X;Y)=I{X;Y)^C{W). 

from Chebyshev’s inequality again. Consequently, the formula in Theo- 
rem 3.2.1 coincides with the formula in Theorem 3.1.1 if the channel is 
stationary and memoryless. That is. Theorem 3.1.1 is obtained from The- 
orem 3.2.1. □ 

Remark 3.2.3 (Channel capacity of nonstationary memory less chan- 
nel). A channel W = {Wi^W2r ’ is called a nonstationary memoryless 



3.2 Coding for General Channel 



187 



channel if it outputs an output sequence y = {yir ' ' ^Vn) C 3^’^ correspond- 
ing to a given input sequence x == (xi, • • • ,Xn) C with the conditional 
probability 

n 

W^{y\^) = \{Wi{yi\xi). 

i=l 



The channel capacity C(W) of such a channel is given by 
1 ^ 

C(W) = lim inf - V CiWA 

n— »oo n ^ ' 
i=l 



(3.2.22) 



under the assumption that either input alphabet X or output alphabet y is 
finite, which can be verified by a careful check of the argument above yielding 
Theorem 3.1.1 from Theorem 3.2.1 (cf. Remark 3.1.1). Therefore, for example, 
if lim == IT in a special case with finite X and T, C(W) coincides with 

i-^oo 

C{W) given in Theorem 3.1.1. Another example is the following. 



_ r TTi, if i is odd, 

^ \kF 2 , if i is even, 

we have C(W) = + C(W 2 )). On the other hand, if we set 

J= {i\ 2^^-^ <i<22^/c = l,2,---} 

- {2, 3, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, • • •} (3.2.23) 



and consider a nonstationary memoryless channel W satisfying 



W,= 



(Wu 

1^2, 



if i G J, 
if i ^ J, 



the channel capacity of W is given by 



C(W) - ^min JAC(TTi) -f (1 - \)C{W2)) 

3 — 3 

= I min(C(W^i), C{W 2 )) + i max{C{Wi), C^), 
where we use the following two properties: 
liminfl|J n {1, 2, • • • , n} | = 1 

n^oo 71 C) 

limsupl|J n{l,2, = ^. 

n-^oo 



□ 



188 3 Channel Coding 



Thus far, while we have succeeded in obtaining Theorem 3.1.1 from Theo- 
rem 3.2.1 via Lemma 3.2.3, the argument above requires the assumption that 
either input alphabet X or output alphabet y is finite so that we can make 
use of Chebyshev’s inequality. 

However, we can actually obtain Theorem 3.2.2 (a strengthening of The- 
orem 3.1.1) from Theorem 3.2.1 without this assumption, i.e., instead of 
Theorem 3.1.1, we have: 

Theorem 3.2.2. Lei X and y be any abstract input and output alphabets, 
respectively. Then, the channel capacity C(W) of a stationary memoryless 
channel W = {kF} is given by 

C(W) - CiW) = sup/(X;y), 

X 

where Y is the output variable via the channel W due to the input variable 

X and sup denotes the supremum with respect to all random variables taking 
X 

values in X. 

Proof We first focus on the inequality 
7(X; Y) < liminf 1/(X”; Y") 

n^oo Ti 

that will appear later in Theorem 3.5.2 and holds for any abstract input and 
output alphabets X, y. Thus, in view of the inequality (3.2.14) for memory- 
less channels, we have 

J(X; Y) < liminf 

n-^oo n 

i=l 

Hence, we can directly obtain 

/(X; Y) < C{W) = sup/(X;Y) 

X 

without using Lemma 3.2.3. On the other hand, by defining X — 

and Y = by (3.2.21), where X and Y are the random variables 

attaining C{W) = sup/(X; Y), and using Khintchin’s law of large numbers 
X 

(Theorem 1.3.2) instead of Chebyshev’s inequality, we can obtain 
7(X;Y)=/(X;y) = C(W). 

Consequently, we can conclude from Theorem 3.2.1 that Theorem 3.2.2 (and 
hence also Theorem 3.1.1) holds for any abstract input and output alphabets 
not restricted to finite sets. □ 



3.2 Coding for General Channel 



189 



Remark 3.2.4. A general channel W = defined at the beginning 

of this section is supposed to have an input set and an output set 
for each n = 1, 2, • • • that are the Cartesian products of an input alphabet X 
and an output alphabet 3^, respectively. However, Theorem 3.2.1 still holds 
if and are replaced with arbitrary sets An 3^n (An ^ind Tn can be 
countably infinite or abstract sets not restricted to the Cartesian products), 
respectively, and ; X'^ — > y^ with an arbitrary channel : An 3^n 
(see arguments in §1.12 of Chapter 1). This remark is valid throughout this 
chapter. □ 



Let us now consider here the information-spectrum of a code used in 
channel coding. Given an arbitrary (n, en)-code Cn = {ui, U2 , • • • , um^} 
(u^ G X'^) for a channel W = we can define the random variable 

X'^ subject to the uniform distribution over Cn and the output of the 
channel W corresponding to the input X'^ similarly to Lemma 3.2.2. Setting 
X = {X^}^^i and Y — let us call the distribution of 

1 W^{Y^\X'^) 

n Pyn(r”) ’ 

i.e, the distribution of the mutual information density rate of (X,Y) the 
information spectrum of the code Cn- How does the distribution of this 
information-spectrum behave as n — > oo? The following theorem gives an 
answer to this question. 



Theorem 3.2.3. For any channel W = the information- spectrum 

of an arbitrary Mn^Sn)-code Cn satisfying 



lim = 0 and lim — log Mn = R 

n-^oo n-^oo n 



(3.2.25) 



converges in probability to the rate R. That is, 



p- lim — log 

n-^oo n 



PyAYV 



= R. 



(3.2.26) 



Proof. Lemma 3.2.2 tells us that for an arbitrary 7 > 0 
£n > Pr 



1 1, 

“ — p ■ < - l0gM„ - 7 

n iV«'(Y^) n 



ri'y 



1 



Since — logM„ > i? — 7 (Vn > no) from (3.2.25), it follows that 
n 



> Pr i - log 



(-1 

\n 



Py-(l^”) 



< i? — 27 > — e 



-ri'y 



Therefore, taking lim Sn = 0 and lim e = 0 into account, we obtain 

n— >oo n-^oo 



190 3 Channel Coding 



lim Pr 

n—^oo 






log 



PyAY^) 



<R-2j} ^0. 



(3.2.27) 



On the other hand, since Px”(x) = for ’x.&Cn, it follows that 

PyAYA “ PxA^A 
1 



< 



PxAXA 



M„. 



Therefore, we have 

By noting that (3.2.25) implies — logM^ < R + j {\fn > no), substitution of 

n 

(3.2.28) into (3.2.28) yields 

Pr 1 1 log > P + 7| = 0 (Vn>no). (3.2.29) 

Since 7 > 0 is arbitrary, (3.2.27) and (3.2.29) imply (3.2.26). □ 



3.3 Coding for Mixed Channels 

Let us consider the channel capacities of mixed channels as application of the 
coding theorem for a general channel described so far. Let Wi = 
and W2 = {^ 2 }rS=i arbitrary general channels. We call the channel 
W = defined by 

W"(y|x) - aiWr(y|x) +a 2 W 2 "(y|x) (x G ^^,y G y^) (3.3.1) 

the mixed channel of Wi and W2, where ai > 0 and Q2 > 0 are constants 
with ai + a2 = 1. The mixed channel^ is one typical example of stationary, 
but nonergodic, channels with “memory.” Mixed channels are important in 
the sense that we usually treat the mixed channels at the first step of gener- 
alization when we start studying channel coding from the simplest stationary 
memoryless channels. 

We need the following lemma in order to obtain the channel capacities of 
the mixed channels. First, letting Px^y^ and Px^y^ be arbitrary probability 
distributions, define another probability distribution Px^y^ by 

Px"y-(x,y) = aiPxj'.y/*(x,y) + a2Pxyy"(x,y). 



* The mixed channel defined here is the same one Ahlswede [2] originally called 

the averaged channel 



3.3 Coding for Mixed Channels 191 



Then, it clearly holds that 

Px'-{y^) = aiPx"(x) + o:2Pxy(x), 

Pyn(y) = 0:iPyn(y) + a2PY^{y), 

etc. Setting 

Xi = {xr}~ 1 , X2 = , X = , , 

Yi = {YnZi . Y2 = {Y2”}“ 1 , Y = {y”}“ 1 , 
we have the following lemma. 



Lemma 3.3.1. 



I(X; Y) = min(/(Xi; Yi),/(X 2 ; Y 2 )). 



Proof. We use the identity ^ 



log 



Pyn|Xn(Y»|X») 

PyAY^) 



= logPx^.yr.(X", Y”) - logPx^(X") - log Py..(Y") 



(3.3.2) 



and evaluate each term on the right-hand side of (3.3.2) similarly to the proof 
of Lemma 1.4.1. We can prove the lemma by combining six inequalities cor- 
responding to (1.4.8) and (1.4.11) with the property in Lemma 1.4.2. Here, 
we need to replace p- lim sup and max with p- lim inf and min, respectively. □ 

n— too n— too 



Remark 3.3.1. The information-spectrum of 



1 ,_Py.|x4Y"|X”) 
n Py"(Y") 



becomes the weighted sum of the information-spectra of 



1, -Py,-W«”|V”) 

Pr,W) 



and 

1 Py,"|xj.(Y2”|X2") 

n Py.(Y 2 «) 

^ In the case where input alphabet X and output alphabet y are general alpha- 
bets (not restricted to countably infinite alphabets), we interpret -Pyr.,|x^ (y|x), 
Px^'^y’^^(x,y), Px^(x) and Pyr» (y) in (3.3.2) to denote the corresponding (con- 
ditional) probability measures Pyrr|x« (dy[x), Px’^y^ (dx, dy), Px^(dx), and 

Pyrr(dy), respectively. Here, 5'n(y|x) = left-hand side of 

(3.3.2) is interpreted to be the Radon-Nikodym derivative between two proba- 
bility measures on y^. With this interpretation, the right-hand side of (3.3.2) 
gives a symbolic expression for the left-hand side of (3.3.2). This symbolic inter- 
pretation makes the necessary calculations intuitively more clearer. 



192 3 Channel Coding 



according to weights a\ and Q 2 , respectively, asymptotically as n ^ oo (see 
also Remark 1.4.1 in Chapter 1). □ 



Now, let X = be an arbitrary input process and define 

= Px^.(x)W^i”(y|x), 

Px-y-(x,y) = Px«(x)PF2"(y|x), 

Pxny»(x,y) = Pxn(x)W^"(y|x), 

where is the mixed channel defined in (3.3.1). Then, it clearly holds that 

Px- Y- (x, y) = aiPx-Y-- (x, y) + a 2 Pxr^Y-- (x, y). 

If we set X = Yi = and Y 2 = {Y^}u=y Lemma 3.3.1 

implies 

7(X;Y) =min(/(X;Yi),/(X;Y 2 )). 

Then, in view of Theorem 3.2.1 we have the following result. 

Theorem 3.3.1. The channel capacity C(W) of the mixed channel W de- 
fined in (3.3.1) is given by 

C(W) - supmin(/(X; Yi),/(X; Y 2 )). (3.3.3) 

X 

Here, let us evaluate the right-hand side of (3.3.3) for the case that both 
Wi and W 2 are stationary memoryless channels expressed as Wi = {Wi} 
and W 2 = {^^ 2 }, respectively. First, we have 

I(X;Yi) < Uminf 

i=l 

I(X;Y 2 ) < liminf 

i=l 

as are shown in (3.2.24) in the proof of Theorem 3.2.2, where 

— l-^ll 5 • ■ * 7 -fin b 

Y^'^Y 

^2 — \^21 5 • • • ? -f 2n )‘ 

Define as the random variable satisfying Pr for z == 

1, 2, • • • , n and and Y^^^ by 

^(n)y^(n)^^(n) _ ^ given = i. 

Then, we have the following upper bounds of 7(X;Yi) and 7(X;Y2) ex- 
pressed in terms of the conditional mutual information: 



3.3 Coding for Mixed Channels 193 



/(X; Yi) < (3.3.4) 

n— )-oo 

J(X;Y 2 ) < liminf7(X(”^Y2^”^|Q(")). (3.3.5) 

n-^oo 

Since the channels Wi and W 2 are stationary and memoryless, it clearly holds 
that 

P^^„)Y^r,.){x,y) = PxM{x)Wi{y\x), 

■Px(„)yw(a;,2/) = Px(.^'-){x)W2{v\x), 

and therefore, — > y/”^ and ^ form Markov 



chains. From a property of the Markov chains, we have 

7(X("); y/”^|g("^) < 7(X(”)g(”);y/”^) = 7(X<");y/"^), (3.3.6) 

7(X(");y2^”'|Q(”)) < 7(X(”)g(”);y2^"^) = 7(X(”);y2^"^) (3.3.7) 

(cf. Cover and Thomas [17]). Then, (3.3.4) ~ (3.3.7) tell us that for an arbi- 
trarily small 7 > 0 there exists a sufficiently large n satisfying 

7(X;Yi) < 7(X(”^y/”>) + 7 , (3.3.8) 

7(X; Y 2 ) < 7 (X(");y 2 ^”^) + 7 . (3.3.9) 



Hence, it follows that 

supmin(/(X;Yi),/(X;Y 2 )) 

X 

< sup min( 7 (X(”);y/" 7 ^(^^”^i" 2 ”')) +7 

X(-) 

<supmin(/(X;yi),/(X;Y2))+7, 

X 

where Yi and Y 2 denote the outputs of the channels W\ and W 2 with X as 
the input, respectively. By letting 7 — ^ 0, we have 

supmin(/(X; Yi),/(X; Y 2 )) < supmin(/(X; Yi), /(X; Y 2 ))- (3.3.10) 

X X 

On the other hand, if we choose X = X attaining the supremum on the right- 
hand side of (3.3.10), denote by Yi and Y 2 the outputs from the channels 
Wi and IY 2 with X as the input , respectively, and define X = {X }^i, 
Yi = {F 7- 1 and Y 2 = {F^}~ 1 by 

n 

^x"yr (^> y) = n Vi)’ 

i=l 

n 

P— n.^^(x, y) = P^^^ (x^, T/i), 

i=l 

then Khintchin’s law of large numbers (Theorem 1.3.2) guarantees that 
min(/(X;Yi),/(X;Y 2 )) 

coincides with the right-hand side of (3.3.10). Therefore, the inequality in 
(3.3.10) holds with equality. Summarizing, we have the following theorem: 



194 3 Channel Coding 



Theorem 3 . 3 . 2 . For an arbitrary input alphabet A' and an arbitrary output 
alphabet y (X and y are not restricted to finite sets), the channel capac- 
ity C(W) = C{ai,a2]Wi,W2) of the mixed channel W of two stationary 
memoryless channels Wi = {l^i} CL't^d W2 = {W2} is given by 

C(ai, ^ 2 ; PF 2 ) = sup min(/(X; Ti), /(X; ^ 2 )), (3.3.11) 

X 

where Y\ and Y2 denote the outputs of the channels W\ and W2 with X as 
the input, respectively. □ 



Here, let us consider the mixed channel of infinitely many channels. That 
is, for arbitrarily given infinitely many general channels 
(i = 1, 2, • ■ •), we call the channel W = defined by 

CXD 

W^”(y|x) = ^aiWr(y|x) (Vn = 1,2, • • • ;Vx € € 3^") (3.3.12) 

2=1 

the mixed channel of the channel family where (i = 1, 2, • • •) are 

constants satisfying 

oo 

^ a, = 1 (a* > 0 : Vi = 1, 2, ■ • ■). 

2=1 

We need the following lemma in order to obtain a formula of the channel 
capacity of this mixed channel. Let Pki'^yp {i = be an arbitrarily 

probability distribution over X'^ x and define another probability distri- 
bution by 

oo 

P^nyn (x, y) = ^ aiPx^Y^ (x, y). (3.3.13) 

2=1 

Then it holds that 

oo 

Px" (x) = ^ aiPx- (x), (3.3.14) 

2=1 

OO 

■fV'‘(y) = X^ai-Pv;"(y) (3.3.15) 

2=1 

and so on. Setting 

(X„Y,) = {(X”,F”)}~i (i = l,2,...), 

(X,Y) = {(X”,y")}~i, 

we have the following lemma on the spectral inf-mutual information rate that 
is a generalized version of Lemma 3.3.1. 



3.3 Coding for Mixed Channels 195 



Lemma 3.3.2. 

I(X;Y)= inf I{Xi,Yi). 

i:ai>0 



(3.3.16) 



Proof. Notice that we can assume o:^ > 0 (z = 1, 2, • • •) without loss of gen- 
erality. 

1) First, we prove 

/(X;Y) < inf/(X,;Y,). (3.3.17) 

i>l 

To this end, for an arbitrary positive integer k we set 

oo 

and define the channel Wk — by 

^ oo 

“fe+i i=k+i 

Then, the channel W = can be expressed as the mixed channel of 

a finite number of channels (i = 1, • • • , A:) and TT/e as follows: 

k 

M"”(y|x) = ^aiWT(y|x) + aUi^(y|x) 

i=l 

Therefore, we obtain 

I(X; Y) = min {/(Xi; Yi), • • • ,/(Xfc; Yfe),i(Xfe; Yfe)} < I{Xk;Yk) 

by using Lemma 3.3.1 repeatedly for k times. We now obtain (3.3.17) since 
k is arbitrary. 



2) Next, we prove 

7(X;Y)>inf/(X,;Y,). (3.3.18) 

i>l 

From (3.3.13) we have 

Px-y-(x,y) < (Vi = l,2,--.;VxG Y”,yey"). (3.3.19) 

On the other hand, as is proved in Lemma 1.4.1 in Chapter 1, for all i = 
1, 2, • • • it holds that 



Pr 



n Pxr^iX- 



- n PxAX-) 



+ 7n > > 1 - e‘ 



(3.3.20) 



196 3 Channel Coding 



I n Pyn(r") - n Pyn(y.") 



+ 7n / ^ 1 



-nyri 



where 

7 ^ — ^ 0 , njn oo as n ^ oo. 

Hence, due to (3.3.19)-(3.3.21) it holds that 



(3.3.21) 

(3.3.22) 



PYr^lxAYnXn 
PrAVn 



< P| 



= Pr 



1 Pxny»(Xf,l-») 1 

n ®Pxn(Xr)Py.(l^”) - J 



Pr { 1 log Pxnyn (Xf , Yn - 1 log Px^ (X”) - 1 log Pyn (F,") < p| 



n 



n 



< Pr j 1 log Pxj.yn (Xf ,Yn-- log Px- (Xf) - 1 log Py. (X”) 

I 77/ Tl 71/ 

1 1 1 

< -R H — log — 

n ai 

< Pr 1 1 log Px^Yr (Xf ,Yn-- log Px." (X«) - 1 log Py,™ (r,") 

I TT/ Tl Tl 

< P + 1 log — + 2'fn I + 2e“"'’'" 

n ai } 

f 1 Py-|X”(Xi"|Xf) 1 1 1 

= Pr - log < P + - log - + 27„ U 26-"'^" 

n Py’^{Y^) n ai I 



(3.3.23) 



for all i = 1, 2, ■ ■ •. Then, by using (3.3.13) it follows that 



Py„ix»(y"|x") 



n^°®' Py.^(T”) 



Pr(l 
[n 

= £a.Pr{i 

I” 

oo r ^ 

< ^aiPr<^ - log 



< P 



Py.|Xn(r”|X") 



Pyn(T”) 



log IT ,PJ' < P 

1. Py»|xr(:^"|xr) . „ 1, 1 



i=l 



PY-{yn 



< H — log (- 27 ^ > + 2 e 

n ai 



(3.3.24) 



Thus, Fatou’s lemma and (3.3.22) tell us that 

, ^ f 1, Pyr.\Xr>.{Y-\X-) ^ ^ 

l.msupP,|-log <R 



OO 

< '^ai 
i=l 









A-,. (S'”) 



< H — log h 27n 

n ai 



(3.3.25) 



3.3 Coding for Mixed Channels 197 



Now, we fix R satisfying R< inf/(Xi;Y^) arbitrarily. Then, by noticing 

i>l 

(3.3.22), for alH = 1, 2, • • • it holds that 



lim sup Pr < — log 

n—^oo I ^ 









Py.(y-) 



< P -f - log h 27 n 

n ai 



which, together with (3.3.25), leads to 
limsup Pr I - log ^ < R 

n-^oo In PYr.{Y^) 

Therefore, we obtain /(X; Y) > P, which means (3.3.18). 



0 . 



= 0, 



(3.3.26) 

□ 



If for i == 1, 2, • • • we set X^ = X in Lemma 3.3.2 and denote by Y^ the 
output of the channel with X as the input. Theorem 3.2.1 immediately 
yields the following theorem; 

Theorem 3.3.3. The channel capacity C(W) of the mixed channel W = 
of arbitrary channels ~ 2, • • •) determined by 

(3.3.12) is given by 

C(W)-sup inf Z(X;Y0. (3.3.27) 

i:cXi>Q 

If we consider the special case that the channels {i = 1,2, • • •) are 
stationary and memoryless, we obtain the following theorem: 

Theorem 3.3.4 (Ahlswede [2]). If X is a finite input alphabet, the chan- 
nel capacity C(W) = C({a^, Wi}^-^) of the mixed channel W of stationary 
memoryless channels = \Wi] (^ = 1, 2, • • •) is given by 

C{{ai,Wi}Zi) = sup inf /(X;Y,), (3.3.28) 

X i:ai>Q 

where X denotes an input variable over X and for each i = 1,2, ••• Yi denotes 
the output of the channel W{ with X as the input. 



Proof. The claim of this theorem can be proved by using Theorem 3.3.3 
similarly to the way of developing Theorem 3.3.2 from Theorem 3.3.1. Note 
here that, due to the assumption that A' is a finite input alphabet, we can 
use 



/(X;Y,) </(X;Y,) (z = l,2,..-) 

instead of (3.3.8) and (3.3.9) in the proof of Theorem 3.3.2, where Px is one 
of the accumulating points of | Y|-dimensional vector P^Cr^) (n — 1, 2, ■ ■ ■) and 
Yi denotes the output of the channel Wi corresponding to the input X for 
2 = 1 , 2 ,---. □ 



198 3 Channel Coding 



We now treat the compound channel deeply related to the mixed channel. 
Suppose that infinitely many general channels (z = 1, 2, • • •) 

are given. Let 



be a message set and fix an encoder (pn : Mn — > and a decoder 'ipri • 

— > Adn- Then, for each channel the error probability is determined 
by 






1 

Wr 



Mr, 

■E 

k=l 



Wnn\<Pnik)) 



(3.3.29) 



for i — 1,2,- • where Vk = C denotes the decoding region of 

a message k G Mn and the superscript “c” means the complement. Notice 
here that the pair of an encoder and a decoder does not depend 

on the channels {% — 1,2,-*-). This kind of situation occurs when a 
message is transmitted through one of the channels {i = 1,2,---) but 
both of the encoder and the decoder do not know the channel through which 
a message is actually transmitted. Such a channel is called the compound 
channel 'W = (Blackwell, Breiman and Thomasian [10], Wolfowitz 

[102]). In the problem of the compound channel we would like to make the 
error probability small for any channel through which a message is actually 
transmitted. Define a pair of an encoder and a decoder with the message set 
of size Mn and the error probabilities Sn^ (i = 1, 2, • • ■) as (n, (^n^)^i)- 

code. We give two definitions on the compound channel. 



Definition 3.3.1. 



Rate R is achievable There exists an {n^M^ (sn^)^i)-code 

satisfying lim Sn^ = 0 (Vi = 1, 2, • • •) and 

n— >-oo 

lim inf — log Mn > R. 

n-^oo n 

Definition 3.3.2 (Channel capacity of the compound channel). 

C(W) = sup {R\ R is achievable} . 



Then, we have the following theorem that describes a relationship between 
the channel capacities of the mixed channel and the compound channel. 

Theorem 3.3.5. The channel capacity C(W) = ^ of the 

mixed channel W of infinitely many general channels is equal 

to the channel capacity C(W) = of the compound channel 

W = That is, it holds that 

C{{c^uWi}Z^) = C{{■W,}Z^). 

Here, we assume that ccj > 0 for all i = 1,2,- ■ 



(3.3.30) 



3.3 Coding for Mixed Channels 199 



Proof. Suppose that a rate R is achievable for the compound channel. Then, 
there exists an (n, (£n^)^i)-code {(pn.'f’n) satisfying 

lim inf — log Mn > R, (3.3.31) 

n— »cx5 Ti 

(i = l,2,---). (3.3.32) 

n-^oo 



If this code is used for the mixed channel, in view of (3.3.12) and (3.3.29) the 
error probability Sn can be evaluated in the following way: 



^n. — 



, M„ 



M 



k=l 

Mr, 



= JLy 

M„. ^ 



k=l L^=l 






E 

i=l 

CXD 



ai 



Mr,. 






k = l 



i=l 

Hence, (3.3.32) and Fatou’s lemma yield lim Sn = 0. This means that R is 

n-^oo 

achievable as a rate for the mixed channel, i.e.. 






Next, suppose that a rate R is achievable for the mixed channel. Then, 
there exists an (n, M^, en)-code for the mixed channel satisfying 

liminf — logMn > R, (3.3.33) 

n^oo n 

lim Sn = 0. (3.3.34) 

n-^oo 



Setting 



Mr,. 



k = l 



it follows from (3.3.12) that 



oo 



Cji — / ^ 5 

i=l 

which yields < — . Consequently, we have lim = 0 for alH = 1, 2, • • • 

n-^oo 

from (3.3.34). This means that R is achievable as a rate of the compound 
channel. That is. 



200 3 Channel Coding 



The combination of Theorem 3.3.4 and Theorem 3.3.5 immediately yields the 
following corollary on the compound channel. 

Corollary 3.3.1 (Wolfowitz [102]). If JY is a finite input alphabet, the chan- 
nel capacity C(W) = for the compound channel of stationary 

memoryless channels = {W^} (z = 1, 2, • ■ •) is given by 

C{{Wi}Zi) = supinf I{X;Y^, (3.3.35) 

where X is the input variable over X and Yi denotes the output of the channel 
Wi corresponding to X for i = 1,2, ■ ■ □ 



Now, let us consider the mixed channel again. We consider here the mixed 
channel with a more general way of mixing compared with (3.3.12) (the 
arguments below correspond to the arguments in §1.4 in Chapter 1). Let 
^ be an arbitrary set (probability space) and assign a general channel W^i = 
fo each 9 e We assume here that, letting X and y be an 
input alphabet and an output alphabet, respectively, for all n = 1, 2, • • • and 
for all measurable sets B C y^, the conditional probability W^(.B|x) is a 
measurable function of (^,x). If we arbitrarily fix a probability measure w 
on we have a channel W = with the conditional probability 

measure 

W^{B\x)= [ W^{B\x)dw{9) (Vn = l,2,---;Vxe AT^). (3.3.36) 

J4> 

We call this channel the mixed channel of a channel family Now 

we fix an input X == to this mixed channel W = and 

denote by Y = the channel output. Then, we define the following 

two functions of R by 

iw(R|X) s lim i„f Pr { i log , (3.3.37) 

Jw{R\X) = lim sup Pr I - log j (3.3.38) 

depending directly on the distribution of the information-spectrum of the 
channel W and attempt to express these functions in terms of w{‘). Since 
this problem is hard if the channel W6i (^ G ^) is general, we assume that 
both X and y are finite and each W^i is stationary and memoryless subject 
to a conditional probability distribution We : X y (we simply write W6i == 
{^6>})- Furthermore, under the assumption that the input X = is 

stationary and memoryless subject to a probability distribution Px over X, 
we have the following lemma. This lemma corresponds to Lemma 1.4.4 in 
§1.4 for source coding. 



3.3 Coding for Mixed Channels 201 



Lemma 3.3.3. Suppose that X and y are finite input and output alphabets, 
respectively. If each = {Wq} is a stationary memoryless channel and an 
input X = {X} is a stationary memoryless source, then it holds that 

[ dw{9) < Jw(^|X) 

J {G\I{X-ye)<R} 

< < [ dw{6) (Vi? > 0) 

J {6\I{X-,Ye)<R} 

(3.3.39) 

for the mixed channel W defined by (3.3.36), where Yq denotes the output of 
the channel Wq with X as the input and I{X;Yq) is the mutual information. 
In addition, the inequalities in (3.3.39) hold with equalities except for at most 
countably infinite R. 

Proof. For an input sequence and an output sequence 
X = {xi,X 2 , • • • ,X„) 6 -Y”, 

y == {yi,y2,---,yn) € y^, 

we define the joint type Tx,y by 

TyL,y{x,y) = {^x £ X,yy ey), 

where n{x,y) means the number of i’s satisfying {xi,yi) = {x,y). Letting 
Ti,T 2 , • • • be all possible joint types, we have Nn < {n (cf. 

Csiszar and Korner [19]). Since W^i is assumed to be stationary and memo- 



ryless for each ^ G it holds that 

l^-(y|x) - TF"(y'|x') for Tx,y = ^x^y^ (3.3.40) 

lF,-(y|x) = TF,-(y'|x') for Tx,y = (3.3.41) 

By setting 

^(x,y) = \9e ^|We”(y|x) < e'^iy"(y|x)} (3.3.42) 

for an arbitrary (x,y) G X^ x and applying Markov’s inequality (see 
Remark 1.1.1 in §1.1) to (3.3.36), we obtain 

Pr {e € y)} > 1 - e"^, (3.3.43) 



where the probability on the left-hand side is measured with respect to w{-). 
On the other hand, in view of (3.3.40) and (3.3.41) ^(x, y) depends only on 
the joint type Tx,y of (x,y). That is, if Tx,y = Tx',y/, then <?(x,y) = ^(x',y'). 
This means that #(x, y) can be written as <P(7fe) (T^ = Tx,y)i where 

Tfe = {(x, y) e T" X T”lTx,y - Tfe} (fc = 1, 2, • • • , N„). 



202 3 Channel Coding 



Defining 

Nr^ Nr, 

X^xy^=[j Tk, K=C] 

k=l k=l 

it follows from (3.3.42) and (3.3.43) that 

Pr {ee^*J>l-{n + (3.3.44) 

and 

W^{y\x) < e^W^{y\x) (V0 € V(x, y) e A”” x (3.3.45) 

where Nn < (n + l)l‘^l l^l is used to obtain (3.3.44). Furthermore, denoting 
by Yq — {y^}’^=i the output of the channel Wq = {^e}?^=i corresponding 
to the input X = due to (3.3.45) we have 

Py^iy) = ^eiy\^)Pxr^{^) 

<e^ W^”(y|x)Px"(x) 

= e^Prn(y), 
for ^ G , i.e., 

Py«(y) < e^Py^^iy) (W e Vy e 3^”). (3.3.46) 

In addition, (3.3.36) leads to 

Px-y"(x,y) = / Px"y” (x,y)c!iy(6») (V(x,y) € A”” x 3^”). (3.3.47) 

1) First, we prove 

[ dw{e) < iw(P|X). (3.3.48) 

J [e\I{X-,Ys)<R) 

Notice here that it holds that 

Pr log WJ‘(y'J^|X") > i \ogW^{Yg^\X^) - 7„| > 1 - (3.3.49) 

for any ^ G ^ as can be seen from the proof of Lemma 1.4.1, where 

7n — > 0, n7n ^ oo as n — > oo. (3.3.50) 

Then, in view of (3.3.46) and (3.3.49) we have 



3.3 Coding for Mixed Channels 203 






1 iy”(yg”|X”) 

n 



= Pr I i log W^(Yq^\X^) - - log Py«{Y^) < R 
[ n n 

> Pr log W7(F,”|X”) - i logPyn(F,”) < p - - e- 

> Pr I i log W^{Ye^\X^) - i log Pyn (y,") < p _ ^ 



= Pr 



n^°®‘ PyAYS^ 



< P - 7„ - 



An 






(3.3.51) 



for all Q G Since it follows from (3.3.44), (3.3.47) and (3.3.51) that 

< rt > 

► dw{0) 



log 



■ \n ® Py-(1^”) 



dw{9) 



> 



j Prj- 

7^. \n 

I 

j-p 



, WS-iYAX^) „ 1 1 

log p ^ , < p - 7n - ^ > 

Pr.-iYe) J 

pJllogW2m<^_ 1 I 

|n Py/iYA " \/n J 



dw{9) 



-n^n 



dw{9) 



A 



^ I 1 , W^IYAXA r. 1 1 

1 ~ ■ p i^n\ <R-ln- 

l^n Pyj-W) V^J 



c/fi;(6i) - 



1 , W^.”(y."|X") „ 1 

p (vn\ — R ~ In ( 

” Pys^o^e) An J 



1 



dw{9) 



we obtain 

, ri, ip"(y"|x") „ 

limmf Pr - log — — , , < P 

n^cxD n PYri(Y^) 



(3.3.52) 



>-l 

J<P 



lim inf Pr 

n—^oo 



li hTOTE) _j_ 

\n ® iV,-W) - ^A^ 



dw{9) 



(3.3.53) 

from Patou’s lemma. On the other hand, since both the channel = {Wq} 
and the source X {X} are stationary and memoryless, we have 



204 3 Channel Coding 



n^oo yn PY^{y^) vnj 

for R < I(X;Yff) and 

n-oo yn Py-\YS^) Vnj 

for R > I(X;Yo). Therefore, (3.3.53) leads to 
J^(RjX) 



= lim inf Pr | — log 



-L 



{n - Pyn(F") 

dw{6), 



<R\ 



l{e\i{x-,Yi>)<R} 

which is nothing but the inequality (3.3.48). 

2) Next, we need to develop 

Jw(i?|X) < [ dw{e). (3.3.54) 

J {e\I{X-,Ye)<R} 

However, this inequality can be developed similarly to the proof 1) above by 
using (3.3.45) and 



Pr I i log Pyn (F«) > i log Pyn (F,") - I > 1 - , 



-n'yn 



for any 0 e 



□ 



Remark 3.3.2. For the input X == and the output Y == 

of the mixed channel W corresponding to X defined in Lemma 3.3.3, the 

mutual information rate can be computed as 

7(X;Y)= lim i/(X";F”) = [ I(X-,Ye)dw(e). (3.3.55) 

n-^oo n J 

Lemma 3.3.3 immediately yields the following theorem. 

Theorem 3.3.6. Suppose that X and y are finite input and output alpha- 
bets, respectively. If each = {Wq} is a stationary memoryless channel 
and the input X = {X} is stationary memoryless source, then it holds that 

/(X; Y) - u;-ess.inf /(X; Yq) (3.3.56) 

for the mixed channel W defined by (3.3.36), where Yq and Y denote the 
outputs from the channels Wq and W corresponding to the inputs X and X, 
respectively. 



3.3 Coding for Mixed Channels 205 

Proof. U R> w-ess.m^I{X]Ye)^ the leftmost inequality in (3.3.39) in Lemma 
3.3.3 yields 

0< [ dw{9) <Jy^{R\X). 

J{e\I{X-,Yo)<R} 

In addition, if < w-ess. ini I {X;Yq), the rightmost inequality in (3.3.39) 
yields 

Jw(i^|X) < [ dw{0) = 0. 

J{e\I{X;Y0)<R} 

Therefore, /(X; Y) == u;-ess.inf /(X; Ygi). □ 

Now we are ready to give a formula for the channel capacity of the mixed 
channel W defined by (3.3.36). 

Theorem 3.3.7. Suppose that X and y are finite input and output alpha- 
bets, respectively. If caches = {Wq} is stationary and memoryless, then the 
channel capacity C(W) of the mixed channel W defined by (3.3.36) is given 
by 

C(W) = supu;-ess.inf/(X; Y6»), (3.3.57) 

where Yq denotes the output of the channel We with X as the input. □ 



Remark 3.3.3. Ahlswede [2] gives the following formula for the channel 
capacity of the mixed channel W defined in Theorem 3.3.7: 

C(W) = inf max sup inf/(X;y6i), (3.3.58) 

0<a<l X {s\SC^,w(S)>l-a}^^^ 

which is more complicated than the formula given in Theorem 3.3.7 and is ob- 
tained in a completely different way without using the information-spectrum 
arguments below. However, it is easy to verify that the two formulae coincide. 

□ 

Remark 3.3.4. If we consider the compound channel W of a family of sta- 
tionary memoryless channels with finite input and output alpha- 

bets X and y, respectively, its channel capacity C(W) is given by 

C(W) = sup inf /(X; Yq) (3.3.59) 

(cf. Wolfowitz [102]). In general, this channel capacity is smaller than or 
equal to the channel capacity of the mixed channel given by (3.3.57). That 
is, for the general cases that ^ is not countable, the channel capacity of the 
compound channel does not always coincide with the capacity of the corre- 
sponding mixed channel (see also Theorem 3.3.5). □ 



206 3 Channel Coding 



Proof of Theorem 3.3.7. 

1) Direct part: 

Since (3.3.56) in Theorem 3.3.6 holds for any input variable X over the 
input alphabet A', we obtain 

C(W) > sup it;-ess. inf / (X ; y6i) (3.3.60) 

X 

from Theorem 3.2.1. 



2) Converse part: 

The basic idea of the proof consists in analyzing in detail the arguments 
given in the proof of Lemma 3.2.3 from the viewpoint of the information- 



spectrum. First, let 




'F 

rH 

II 

C 

II 




be an arbitrary input process and 


11 

II 




V / \r{n) 

i 6> — ’ ^(9,2 ’ ■ 





the output processes from channels W = and 

corresponding to X, respectively. We define the memoryless input process 



X 



{X" = (X[” 



^(n) j^{n) 



' ' 5 



oo 



from X = {X” = • ■ • , X^”))}^ ^ by 



(x) = P^(„.) (rri) • • • P^(„) (Xn) (x = {xi,X 2 ,---, Xn))- 
Denote by 




the output process from the channel corresponding to X. 

Note here that the output process is memoryless and satisfies 



(X, y) = P, 



X' 



Ow(rr) 

^ e,i 



{x,y) 



{i = 1,2, •••,n;V(a;,j/) e X x 3^). 

(3.3.61) 



We notice here that, as was given in the proof of Lemma 1.4.1, we have 







-In 



for 7^ satisfying 



> 1 - e 



3.3 Coding for Mixed Channels 207 



7n ^ 0, n'yri — > cxD as n cxD. 

Hence, letting R be an arbitrary real number, we have 

for all ^ G Setting 

) = n°® P^.(r,") 
for simplicity, (3.3.63) can be written as 



(3.3.62) 



(3.3.63) 



Pr < — log 
n 



W^{Y,-\X-) 

PyAYo^) 



< i? + 7 „ > > Pr {Zn{0) < R} - e' 



Then, (3.3.46) and (3.3.49) in the proof of Lemma 3.3.3 imply that 
can be evaluated in the following form for all 0 : 



Pr 






M/”(y«iX”) „ „ 1 

‘°® pk-w) 



> Pr {Zn{0) <R}-2e- 

Next, we evaluate the right-hand side of (3.3.65). By noticing (3.3. 
using the definition of Y = {Y and the assumption that the 

Wq = {Wq} is memoryless, Zn{0) can be expressed as 

Zn{0) = 



n 






(3.3.64) 

(3.3.64) 

(3.3.65) 

61) and 
channel 

(3.3.66) 



This means that the expectation of Zn{0) is written as 



EZ„(0) = if^/(x7;y,7). 



i=l 



(3.3.67) 



If we introduce the random variable satisfying Pr i for 

i = 1,2, • • • ,n and consider the pair of random variables satis- 

fying = XA and = Yff if = i, (3.3.67) can be evaluated in 
the following manner; 



EZ„(0) =7(X(”);yJ”>|Q(")) 
< J(X(")Q(");yJ”^) 



208 3 Channel Coding 



where the second equality follows from the Markov chain. ^ 

Then, Lemma 3.2.4 in §3.2 }delds 

E{Zn{e)l{Zn{e) > 0]} < + ^^oge. (3.3.68) 

Now, let {k — 1,2,---) be a subsequence of a sequence of probability 

distribution (I XI -dimensional vector) (n = 1,2, •••) converging to a 

limit Px^ where n/c ^ oo as /c oo. Denoting by Yq the output of the 
channel We corresponding to the input X, we have 

^ liX-Ye) as fc -4 oo. 

Then, (3.3.68) implies 

E{Zn,mi^n,{0) > 0]} < I{X-Ye) + S (V/c > /cq; g ^), 

where > 0 is an arbitrarily small constant. We note here that ko does 
not depend on 6 because for all 9 the mutual information I{X;Yq) is uni- 
formly continuous with respect to the probability distribution of X due to 
the assumption that X and y are finite sets. Then it follows from Markov’s 
inequality that 

Pr{Z„,(^)l[Z„,{0) >0] < (l + <5)(7(X;y,) + <5)} 

> (V/c > ko-, we e 0), 

I Y 0 

which leads to 



Pr{Z„,i9)<{l + S){I{X-,Ye) + S)} 

> -L- (Vfc > ko-, we € 0). (3.3.69) 

1 + d 

On the other hand, setting 
0{R) = {0\R> I{X-,Yg)} , 



it holds that 




(if > u;-ess.inf/(X; y^)) 



(3.3.70) 



owing to the definition of u;-ess.inf/(X; Y^)* We now fix an arbitrary R with 
R > w-ess.inH{X;Ye). If a sufficiently small r > 0 and a 5 > 0 are chosen, 
we can choose R satisfying both 



R — T > u;-ess.inf/(X; y^) 
and 



(3.3.71) 



(1 + 5)(/(X; Ye) ^8)<R {W9 e ^{R - r)). 

Then, from (3.3.69) we obtain 



3.3 Coding for Mixed Channels 209 



Pr{ZnM <R}> (Vfc > ko-, ye e ^{r - r)), 

which, together with (3.3.65), leads to 

Pi' ^ lo§ p rvnk\ R ~\~ ^Ifuk ! r 

> - 26-”'='^"'= (V/c > ko; ye € ^{R - r) n 



(3.3.72) 



(3.3.73) 



The combination of (3.3.62), (3.3.73) and (3.3.47) in the proof of Lemma 3.3.3 
yields 



nk Py^’-kiY^'^) y/rik 



«) 



Pyn,(i(«'=^ 

1 , lF”''(r”'=|X”^) 

— log 

rik 



L 

I 

J4>{R-r 

I 

J^(R— 



Pyn.(yD 



dw{6) 



> I Pr 



los '<h+27„+^} 



A-.W) 



dw{6) 



^{R-r)n<P 

5 



> 



1 + 5 
5 



1 + 5 

2g-«fc7.. 



dw{e) 



' 0 / 



/ 

J^( 



<P(R—r)n^* 



dw{6) 



dw{6) 



(3.3.74) 



2(1 -f 5) 

(V/c >ki). We note here that from (3.3.71) 

[ dw{0) > 0 (3.3.75) 

J^{R-r) 

must hold. Then, since (3.3.44) in the proof of Lemma 3.3.3 implies 
[ dw{e) > \ [ dw{e) (Vfc > ki), 

(3.3.74) can be written as 



\rik 



Pr<-log "„ <fl + 27 „. + ^ 



> 



Py’^kir- 

5 



/ 



dw{0) 



3(1 + S) J^(^R-r) 

(V/c > /C 2 ). Therefore, by noticing (3.3.75) again, it follows that 



lim inf Pr 

A;— >00 



I ^ hF”*=(>^”''|^”'') 



< P + 27 „^ + 



(rife “ Py^.(F”0 ^ 



> 0 . 



(3.3.76) 



(3.3.77) 



210 3 Channel Coding 



Since R satisfying R > ii;-ess.inf/(X; can be arbitrarily chosen, (3.3.77) 
means that 

7(X;Y) <^-ess.inf/(X;y0), 
which yields 

sup/(X;Y) < supi(;-ess.inf/(X; Y6 i). 

X X 

Then, in view of Theorem 3.2.1 we finally obtain 
C(W) < supi(;-ess.inf/(X; 1^). 

X 

3.4 6-Channel Coding 

In §3.2 we have considered coding of a general channel under the requirement 
that the error probability Sn satisfies lim = 0. In this section we weaken 

n—^oo 

this requirement; we require only 
lim sup €n < s 

n-^oo 

for an arbitrary constant 0 < 5 < 1. We consider coding of a general channel 
W under this weakened requirement on the error probability. We can expect 
the channel capacity to increase under this requirement. 

We begin with giving definitions. 

Definition 3.4.1. 

Rate R is e-achievable There exists an (n, M^, £^)-code satisfying 

lim sup €n < s and lim inf — log > R. 

n->oo n-^oo n 

Definition 3.4.2 (e-Channel Capacity). 

C(elW) = sup {R I R is e-achievable} . 



In order to find a formula for C(e|W), let X = be an arbitrary 

general source and Y = the output of a channel W = 

with X as an input as defined by 

-Px"y"(x,y) = Px"(x)W"'(y|x). 

In addition, we define: 

Definition 3.4.3. 



3.4 er-Channel Coding 211 



Fig. 3.2. 



J{R\X) 




(Fig. 3.2). Actually, J{R\X) is nothing but Jw(^|X) defined in §3.3. Then, 
we have the following theorem. 

Theorem 3.4.1 (Verdu and Han [91]). The e-channel capacity C(£|W) of 
a channel W is given by 

C{e\W) - sup sup {R I J{R\X) < e} (0 < Ve < 1). (3.4.2) 

X 

Remark 3.4.1. The right-hand side of (3.4.2) is a right-continuous and 
monotone increasing function of £. In addition, as is easily verified, the for- 
mula (3.4.2) with £ = 0 coincides with the formula (3.2.4) in Theorem 3.2.1. □ 



Theorem 3.4.1 is proved by using Lemma 3.2.2 and another lemma given 
below that corresponds to Lemma 1.3.1 for source coding (this lemma is 
generalized to Lemma 3.8.1 in §3.8). 



Lemma 3.4.1 (Feinstein [28]). Let X = be an arbitrary input to 

a channel W = Y = l>he output from the channel 

corresponding to X. Given an arbitrary positive integer Mn, there exists an 
(n. Mm Sn)-code satisfying 



< Pr 






1 , 

fv.(y-) 



+ e 



— ri7 



for all n — 1,2, where j > 0 is an arbitrary constant. 



(3.4.3) 



Proof The proof given here is by Ash [6]. First, set 
fvlx) 

i(x;y)=log (xeA’",y€y”) 

and a — Define 

Mn 



(3.4.4) 



212 3 Channel Coding 



Since the right-hand side of (3.4.5) coincides with the right-hand side of 
(3.4.3), in order to establish the claim of the lemma we have only to show 
the existence of an (n, £^)-code satisfying Sn < Hereafter, we assume 
that An < 1 because the claim of the lemma is trivial if An > 1. Setting 

B{x) = {y € 3^" I i(x; y) > log a} 

for X G we first prove that 

Pr{y” e B(x)} < 1 (3.4.6) 

for any x € Af". Since lP"(y|x) > aPyn(y) for y € B{x), it follows that 
1 > iy”(B(x)|x) 

= ^ W”(y|x) 

y6B(x) 

>a Py..(y) 

y€B(x) 

= Q!Pr{y” € B{x)}, 
which implies (3.4.6). 

Now, we iteratively construct a code Cn for the channel in the following 
way. First, we arbitrarily choose ui G satisfying 

lF"(5(ui)|ui)>l-An 

and set Vi = B{ui). Next, we arbitrarily choose U 2 G satisfying 
VF-(5(U2)-^(U1)|U2)>1-An 

and set V 2 — B{u 2 ) — B{ui). We repeat this operation as far as possible. 
Suppose that this operation stops after the Ln-th repetition. Then, while we 
have 

k-l 

TP”(B(ufc) - U B{ui)\uk) > 1 - A„ (3.4.7) 

i=l 

for k = 1, 2, • • • , Ln, it holds that 

Ln 

W^{B{x) - U S(u,)|x) < 1 - A„ (3.4.8) 

i=l 

for all X G from the definition of Ln. In addition, setting 
k-l 

T>k = B{uk) - IJ B{ui), 

2=1 

(3.4.7) can be written as 



3.4 e-Channel Coding 



213 



W^{Vk\Mk) > 1-An (3.4.9) 

We now consider the code Cn = {ui, • • • , } with the decoding region 

Lrt 

for Uk- Setting V = T>k, it follows that 
k=i 

Fr{i{X^;Y^) > log a} 

= W^{B{k)\x)Px^k) 

= Y w^"(s(x)nr'|x)Px"(x) 

+ Y W^”(S(x)nP‘^|x)Px"(x). (3.4.10) 

xGA’"'' 

We notice here that the first term on the right-hand side of (3.4.10) can be 
evaluated in the following way: 

W"(S(x)nP|x)Px>‘(x) 

< Y W^{V\x)Px4^) 

xGA’- 

Lr, 

= Py„{V)=YPY<Vk) 

k=l 

Lr, 

k=l 

Lr,, 

= ^Pr{r”eP(ufc)} 
k=l 

< (3.4.11) 

a 

where the last inequality follows from (3.4.6). On the other hand, the second 
term on the right-hand side of (3.4.10) is evaluated in the following way. We 
notice that W"^{B{x)r\V^ | x) = 0 for x G {ui, • • • , } because B{x) C V. 

In addition, we notice that W'^{B(x) nV^ \ x) < 1 — due to (3.4.8) for 
X ^ {ui, • • • , } because V = uf^iB{ui). Hence, we obtain 

Y W^{B{x) n I x)Px" (x) < 1 - A„. (3.4.12) 

xGA:'^'- 

By substituting (3.4.11) and (3.4.12) into (3.4.10), we can obtain 

Pr{i(X”;F") >loga} < + 1 - A„, 

a 



that is. 



214 3 Channel Coding 



A„<Pr{i(X";y”)<loga} + — . (3.4.13) 

a 

Substitution of (3.4.5) into (3.4.13) gives rise to 

Pr{i(X";y") < loga} + ^ < Pr{i(X";y") < loga} + 

Therefore, Mn < must hold. This means that we can choose a subcode 
C* = {ui , • • • , UMr, } of size Mn from the code Cn = {ui , • • • , } constructed 

above. Due to (3.4.9), the error probability Sn of this subcode satisfies 

^ k=l 

Accordingly, we obtain 

£„ < Pr{i(X";r”) < loga} + ^ 

from (3.4.5). We have the claim of the lemma by recalling that a = MnC^^ . 

Finally, we consider the case that the code construction described above 
does not stop. We note, however, that even for this case a code satisfying the 
condition of the lemma is obtained if we stop after the M^-th repetition. □ 



Remark 3.4.2. It is obvious that Lemma 3.4.1 holds if the error probability 
Snis defined by 

(3.4.15) 

instead of (3.4.14). The error probability defined by (3.4.15) is called the 
maximum error probability. In fact, we can find a significant meaning of Fe- 
instein’s lemma (Lemma 3.4.1) because Feinstein’s lemma holds in the sense 
of the maximum error probability. On the other hand, the error probability 
defined by (3.4.14) is called the average error probability. All the theorems 
and lemmas given in this chapter excluding §3.8 are valid for the two kinds 
of error probability. □ 



Remark 3.4.3. For the case that Sn is the average error probability defined 
by (3.4.14) or, more generally, defined by 

Mr, 

k=l 

where pk denotes the probability that a message k occurs, we can prove 
Lemma 3.4.1 in a much easier way. That is, we can develop the existence of 
an (n, M^, en)-code satisfying (3.4.3) by following the same lines in the proof 
of the direct part of Theorem 3.1.1, where we replace Tn in (3.1.3) with 



3.4 £-Channel Coding 



215 



= |(x, 



y) e A’" X 



1, iy"(yx) 1, ,, 1 

- log — — ^ > - log M„ + 7 ^ 

^ j 



n 



Pyn(y) 



for an arbitrary positive integer M^. Lemma 3.4.1 with Sn defined as the aver- 
age error probability is sufficient for proving the direct part of Theorem 3.4.1 
described below; we need Lemma 3.4.1 with Sn defined as the maximum error 
probability when we treat the identification coding problems in Chapter 6. □ 



Proof of Theorem 
1) Converse part: 

Suppose that R is £- achievable. That is, suppose that there is an (n, Mn^Sn)- 
code satisfying 



liminf — logM^ > R and limsups^ < s. 

n — *oo n n — >oo 



(3.4.16) 



Let be the random variable uniformly distributed over this code and 
the output of the channel with X'^ as an input. Set X = and 

Y == Then, for an arbitrary constant 7 > 0 Lemma 3.2.2 implies 

that 



Sn > Pi* 




W^{Y^\X^) 

PyAY^) 



< -logMn -7 
n 



We note here that lim inf — log > R means 

n— >00 n 



■logMn > R-1 (Vn > no). 



Therefore, it follows that 

By taking lim sup of both sides, we have 

n— >00 



-717 



e>J{R- 27|X) 
due to (3.4.16). Now, define 

Rq = sup sup {R > 0 I J (i^|X) < e} . 

X 



(3.4.17) 

(3.4.18) 



and assume that R > Rq. Then, we can choose 7 > 0 satisfying R — 
27 > Rq. Since Rq > sup {i^ > 0 | J{R\X.) < e}, it holds that R — 2^ > 
sup{i? I J{R\X.) < e}. Hence, J{R — 27IX) > <s, which contradicts (3.4.17). 
Thus, R < Rq must be satisfied. 



216 3 Channel Coding 



2) Direct part: 

We set = i?o - 37 (7 > 0) for Rq defined by (3.4.18) and show that 
R is ^-achievable. Equation (3.4.18) implies that there exists an input X = 
satisfying 

sup {i^ 1 J{R\X.) < e} > Rq — ^ = R 2^. 

Therefore, there exists an R' with i?' > + 7 satisfying J{R'\X.) < e. Since 

J(i?|X) is monotone increasing with respect to R, we have 



J(i? + 7|X) < e. (3.4.19) 

On the other hand, setting it trivially holds that liminf — logM^ 

n— >00 n 

> R. Denoting by Y = the channel output corresponding to 

X = Lemma 3.4.1 guarantees the existence of an (n, ^n)-code 

satisfying 



Sn < Pr 




Pyr^-iY^) 



< - \ogMn-\-j> +e 
n 



i.e.. 









By taking limsup of both sides, it follows that 



limsup^n < J{R-\- 7|X), 

n— >00 



which, together with (3.4.19), implies limsup^n < 

n—^00 



□ 



Example 3.4.1. Let us consider here the ^-channel capacity of the mixed 
channel defined by 

W"(y|x) - aiWi^(ylx) + a 2 W^{y\x) 



Denote by Y/^ and Y 2 the outputs of the channels Wf and Wf corresponding 
to an arbitrary input respectively. Then, Remark 3.3.1 tells us that the 
information-spectrum of 

1 W^(Y^\X^) 

n Py’^{Y^) 

becomes the weighted sum of the information-spectra of 



1 PFf(yf|X”) 

n Pyn(yn) 



and — log 
n 



W^(Y>p\X^) 

Py-{Y^) 



according to weights a\ and o;2, respectively. Therefore, the e-channel capac- 
ity C(e|W) of the mixed channel is generally a function depending on e (cf. 
Wolfowitz [101]). □ 



3.4 £- Channel Coding 217 



Example 3.4.2. Let input and output alphabets be A' == 3^ = {0, 1}. Denote 
by = {We} the binary symmetric stationary memoryless channel with 
the crossover probability 6 (0 < ^ < 1). Let us consider the mixed channel 
W == defined by 

W^{y\x) = C W^{y\^)dw{6) ((x,y) G x y^) (3.4.20) 

Jo 

which mixes W6i in a continuous way and try to find the £:-channel capacity, 
where w{) is an arbitrary probability measure satisfying 

1 

dw{6) = 1. 

Denote by Y (9 the output of the channel corresponding to an input X. 
We first note that the following two supremums 

sup7(X; Y^) - sup/(X; Y^) = log 2 - h{6) 

X x 



are simultaneously attained by the stationary memoryless input X = X = 
{X} satisfying Pr {X = O} = Pr {X = l} = ^ (0 < < 1). Thus, we obtain 



L 



dw{6) 



{e\\og2-h{e)<R} 

<J^{R\X)=m^J^{R\X) 



< 



inf Jw(i^|X) = Jw(i^|X) < [ 

X Jl 



{e\\og2-h{G)<R} 



dw{0). 



from Lemma 3.3.3. Hence, owing to Theorem 3.4.1 the ^-channel capacity of 
the mixed channel W defined in (3.4.20) is given by 



C(e|W) = sup 



R 



L 



{6\ log 2- h{G)<R} 



dw{0) < 



(3.4.21) 



(0 < Ve < 1). 

In particular, we can verify that Fig. 3.3 gives the e-capacity C(e|W) 
of the two binary symmetric stationary memoryless channels Wi and W 2 
(X = T = {0, 1}). Here, pi and p 2 are the crossover probability of Wi and 
W 2 {h{pi) > h{p 2 )) and W is the mixture of W\ and W 2 with weights a\ 
and a 2 , respectively. 

□ 



Remark 3.4.4. Example 3.4.2 can be regarded as a special case of the for- 
mula of the e-channel capacity for the “regular decomposable discrete chan- 
nels” [X and y are assumed to be finite) by Winkelbauer [97]. □ 



218 3 Channel Coding 



Fig. 3.3. 



C(e|W) 



1 - h{p2) 

1 - h{pi) 






0 ai 1 



s 



Example 3.4.3 (Verdu and Han [91]). Let us consider the additive channel 
given in Example 3.2.1 with the nonstationary and nonergodic noise process 
Z defined in Example 1.6.3 in Chapter 1 (note that the source X is written 
as Z). Since = log 2 as was explained in Example 1.6.3, (3.2.9) tells us 
that the channel capacity of this additional channel satisfies C(W) = 0. On 
the other hand, if the channel input X is the stationary memory less source 

satisfying Px(0) = ^x(l) = -^(^|X) in (3.4.1) is equal to J{R\X.) = ^ . 

2 log 2 

Hence, Theorem 3.4.1 implies that the e-channel capacity is given by 
C(e| W)-elog2 (0 < Ve < 1). 



3.5 Strong Converse Theorem on Channel Coding 

We have the strong converse theorem for coding of a general channel as well 
that corresponds to the strong converse theorem on source coding given in 
§1.5. This section is devoted to description of the strong converse property 
of a general channel. 

Definition 3.5.1. Denote by C(W) the channel capacity of a channel W. 
If for any R satisfying R > C(W) all the (n, M^, £ri)-codes with 

lim inf — log > R 

n-^oo Ti 

satisfy 

lim £„ = 1, 

n— »oo 

the channel W is called to satisfy the strong converse property. □ 



In this section as well, we denote by Y = the output of a channel 

W == with an input X = Define: 



3.5 Strong Converse Theorem on Channel Coding 219 



Definition 3.5.2. 



/(X; Y) = p- lim sup — log 

n— >co 



W^{Y^\X^) 
Pyr^{Y-) ‘ 



We call /(X;Y) the spectral sup-mutual information rate of (X, Y). Let- 
ting /(X; Y) be the spectral inf-mutual information rate defined in Defini- 
tion 3.2.1, we have the following theorem: 

Theorem 3.5.1 (Strong converse theorem: Verdu and Han [91]). A 

channel W satisfies the strong converse property if and only if 

sup/(X; Y) = sup7(X; Y). (3.5.1) 

X X 



Proof 

1) Sufficiency: 

Assume (3.5.1) first. Set R = C(W) + 37 for 7 > 0 and consider an arbi- 
trary (n,Mn,^n)-code satisfying liminf — logM^ > R. Let X = {X^j^ibe 

n^oo n 

the input uniformly distributed over this code and denote by Y = 
the output of the channel W = corresponding to X. Then, it fol- 

lows from Lemma 3.2.2 that 



£n > Pr 



a 



n Pyr,.(T^) n ‘ 

1 



On the other hand, since lim inf — log Mn > i^, it holds that 

n^oo n 



- log 7 (Vn>no). 
n 

By using this, we have 



> Pr(- 

[n 



log r < R 



- 27 I - ' 



-nj 



PyAyv 

We note here that Theorem 3.2.1 and (3.5.1) imply that 
R = C{W) + 37 - sup/(X; Y) 4- 37 = sup7(X; Y) + 37 , 

X X 

which leads to 

R — 2'y = sup /(X; Y) - 1 - 7 . 



(3.5.2) 



Therefore, it follows from the definition of /(X; Y) that 



lim Pr 

n— »oo 






log < -R - 27 



1 , 



which, together with (3.5.2), shows lim £„ = 1. 



220 3 Channel Coding 



2) Necessity: 

Set R = C(W)+7 for an arbitrary 7 > 0 and define — e^^. It trivially 
holds that lim inf — log > R> C(W). If we consider the (n, ^n)-code 

n^oo Ti 

given in Lemma 3.4.1 for an arbitrary input X = and the output 

Y = {Y”}^i corresponding to X, we have 



<Pr(- 

In 






Since lim £n = 1 holds from the assumption of the strong converse property, 

n— »oo 



n^cxi [n 






1 



must be satisfied. This means -R-f 7 > /(X; Y). Since the input X is arbitrary, 
we obtain 



R + 7>sup/(X;Y). 

X 

By substituting R = C(W) 4- 7 = sup/(X; Y) + 7 into the inequality above, 

X 

it holds that 



sup/(X; Y) 4- 27 > sup/(X; Y), 

X X 

which leads to 

sup/(X;Y) >sup7(X;Y) 

X X 

because 7 > 0 is arbitrary. Since the inequality in the opposite direction is 
clear, we obtain 

sup/(X;Y) =sup7(X;Y). 

X X 

Remark 3.5.1. As is obvious from the definition of the strong converse 
property, if a channel satisfies the strong converse property, the e-channel 
capacity C(e|W) becomes a constant 

C(e|W) = C(W) (0 < Ve < 1) (3.5.3) 

independent of s. (Wolfowitz [102] and Csiszar and Korner [19] define the 
channel with the “strong converse property” as the channel satisfying (3.5.3). 
However, the “strong converse property” defined in this book is much stronger 
than (3.5.3) in general as can be seen from the example below.) However, since 
the ^-channel capacity of the mixed channel given in Example 3.4.1 depends 
on £, we can see that the channel does not satisfy the strong converse property. 
In fact, the mixed channels do not satisfy the strong converse property in 
general. 



3.5 Strong Converse Theorem on Channel Coding 



221 



We can see from the following example that (3.5.3) does not imply the 
strong converse property. Let be the stationary memoryless binary sym- 
metric channel with the crossover probabilities pi and p 2 {h{pi) > h{p 2 )) 
for the cases that n is odd and even, respectively. While for the channel 
W = defined in this way we have (3.5.3), the channel does not 

satisfy the strong converse property because 

sup/(X; Y) - log 2 - h{pi) < sup7(X; Y) = log 2 - h{p 2 ). 

X X 

The following theorem describes a relationship between the spectral 
sup/inf- mutual information rates and the ordinary mutual information. 

Theorem 3.5.2. If at least one of an input alphabet X and an output al- 
phabet y is a finite set, it holds that 

I(X;Y) < liminf 

n-^oo n 

< limsuplj(X”;y") <7(X;Y) < min(log |Y|, log |>^|), 

n^oo 'a 

(3.5.4) 

where Y = denotes the channel output corresponding to X. — 

and the first inequality in (3,5.4) holds for any abstract (not nec- 
essarily finite) X and y. 



Proof It suffices to develop 



/(X;Y) <liminfl/(X";Y”) 

n— >oo n 



and 



limsup-/(X”;y”) < /(X; Y) < min(log |Y|,log |y|). 



(3.5.5) 



(3.5.6) 



First, we develop (3.5.5). In this part of the proof we do not use the eissump- 
tion that at least one of X and y is finite. Set 



i(x;y) = log 



Y^\X- 



‘(y|x) 



(xe Y”,yey") 



(3.5.7) 



Py„(y) 

for simplicity. Letting 7 > 0 be an arbitrary small constant, it follows that 

-J(X”;Y”) 

n 

= E-f(X";Y”) 

n 



E<{ -i(X”;Y”)l 



n 



-i(X”;Y”) <0 
n 



+ E 



l-i 

In 



(X”;Y")1 



0 < -i(X";Y") < 7(X;Y) 
n 



-■']} 



222 3 Channel Coding 



' n 



/(X;Y)-7< 



> E(-i(X";y")l 

[n 



<0 



]} 



+ E|ii(X";y")l 

[n 



7 ( X ; Y ) - 7 < - i ( X ”; y ") 



i V Px-.(x)E{i(x;y”)l [i(x;y”) < 0]|X” = x} 



xG 



+ E|iz(X";y")l 

[n 



J ( X ; Y ) - 7 < - i ( X "; y ") 

n 



]} 



(3.5.8) 



By applying Lemma 3.2.4 to the first term on the right-hand side of (3.5.8), 
we have 



i/(X”;y”) > — logi 
n ne e 



+ E 






yn)i 



7(X;Y)-7< -i(X";y”) 



1 , 1 
> — log - 
ne e 



+ (7(X;Y)-7)Pr 



y ”)> 7 ( X ; Y )- 7 ^ 



(3.5.9) 



We note here that 



lim Pr I -i(X"; Y”) > 7(X; Y) - 7 1 = 1 

n-^oo \^n J 

due to the definition of /(X; Y). Therefore, by taking lim inf of both sides of 

n —^00 

(3.5.9), we obtain 



lim inf -7(X”; Y”) > 7(X; Y) - 7. 

n— »oo n 

Since 7 > 0 is arbitrary, we have 
lim inf i7(X"; Y”) > 7(X; Y) 

n— >00 n 



by letting 7^0, which establishes (3.5.5). 

Next, we prove (3.5.6). In this part of the proof we need the assumption 
that at least one of X and 3^ is a finite set. Letting 7 > 0 be an arbitrarily 
small constant, under the notation (3.5.7) it follows that 



3.5 Strong Converse Theorem on Channel Coding 223 






n 



n 



- E<| 



</(X;Y)+7 



n 



]} 



+ Ej 


[-i(X";Y")l 

n 


b(X";Y") >/(X;Y)+7 
_n 


</(X;Y)+7 




+ Ej 


[-i(X”;Y")l 

(n 


-z(X";Y”) >7(X;Y)+7 
n 



]} 



(3.5.10) 

Here, we assume without loss of generality that A” is a finite set and show 

lim > log 1^1+71 1 =0. (3.5.11) 

n-^oo n \_n J J 

To this end, since we have 

Pxr..lYAX^\Y^) . 1 



i(X”;H”) = log- 



< log ■ 



Pxr^m "*^^PX-(V") 
from the assumption that A" is a finite input alphabet, it suffices to develop 
lim E {Znl [Zn > log I A”! + 7 ]} = 0i 

n^oo 

where 

"" n Px” (V«) ■ 

However, this equality was already established in the proof of Theorem 1.7.2 
in §1.7 in the form of lim sup Bn = 0. Thus, we have (3.5.11). Next, by noticing 

n-^oo 

that 



E<^ -i(A:”;r”)l 



-i(X”;y”)>log|A’|+7 



> (log |y| + 7) Pr H”) > log |A'| + 7 | , 



we obtain 

7(X;Y) <log|A’|+7 

because (3.5.11) shows that 
1 






lim Pr -f(V”; Y”) > log |A'| + 7 ^ = 0. 

n^oo n 

Since 7 > 0 is arbitrary, we now have /(X;Y) < log|y|. We evaluate the 
second term on the right-hand side of (3.5.10) as 



224 3 Channel Coding 



Fn = E\-i{X'^-,Y'^)l 



>/(X;Y) + 7 
n 



]} 



[n 

+ E/-i(X”;y")l 

[n 



log I Y| + 7 > F") > 7(X; Y) + 7 

n 



]} 



iz(X”;Y”) >log|Y |+7 



<(log|Y| + 



.{-i 

u 



7 ) Pr I —i 
[n 



(X";Y") >7(X;Y)+7 



+ E<^ -i(X";Y”)l 



-i(X";Y")>log|Y |+7 
n 



]} 



(3.5.12) 



and take (3.5.11) and 

lim Pr i - j(X”; Y”) > 7(X; Y) + 7 i = 0 

n^oo n J 

into consideration. Then, it follows that lim Fn = 0. Hence, the combination 

n—^oo 

of this with (3.5.10) leads to 

lim sup -7(X”; Y") < 7(X; Y) + 7 . 

n— ^•oo ^ 

By recalling that 7 > 0 is arbitrary, we finally obtain 
limsupl7(X”;Y") <7(X;Y). 

n-^oo 

Theorem 3.2.1, Theorem 3.5.1 and Theorem 3.5.2 immediately yield the 
following corollary: 

Corollary 3.5.1 (Verdu and Han [91]). Let W be a channel with input and 
output alphabets o,nd y, respectively, and suppose that at least one of A! 
and y is a finite set. If W satisfies the strong converse property, it holds that 

C(W) = sup7(X; Y) = sup7(X; Y) = lim 1 sup7(X"; Y”), 

X X n X" 

where Y = denotes the channel output corresponding to X. = 



Remark 3.5.2. So far, in most cases, information theorists have attempted 
to find various kinds of formulas for the channel capacity in terms of the 
mutual information rate (the expectation of the mutual information density 
rate) or in terms of information quantile. However, this kind of approach 
cannot be viewed as being quite successful in establishing a unifying ca- 
pacity formula. To reach reasonable capacity formulas, many people assumed 
case-by-case regularity conditions such as “stationarity,” “ergodicity,” “finite- 
memoryness,” “information stability,” “decomposability” , or “d-bar conti- 
nuity,” and so on (see, e.g., Nedoma [73], Wolfowitz [ 101 ], Dobrushin [23], 



3.6 Channel Capacity with Cost Constraint 



225 



Pinsker [76], Hu [51], KiefFer [56], Winkelbauer [97], Ziv [105], Gray and Orn- 
stein [34]). In particular, the notion of information quantile is originally found 
in Shannon [78], and subsequently in Winkelbauer [95], [97], [98], [99], Kieffer 
[56] , Gray and Ornstein [34] which may be regarded as a germ of the present 
notion of an information-spectrum approach, but they are completely differ- 
ent in both of generality and perspective, because such information quantile 
approaches require, as a stringent condition, the almost sure convergence of 
the mutual information density rate, which is applicable only to a restricted 
class of channels, whereas the information-spectrum approach does not need 
any assumptions and is applicable to any general channels in principle. 

In this connection, we have shown that a unified general formula ( of 
information-spectrum nature) on the channel capacity in Theorem 3.2.1 can 
be obtained if we give up using the notion of the mutual information rate and 
introduce the notion of the spectral inf-mutual information rate. Neverthe- 
less, Corollary 3.5.1 guarantees that the channel capacity can be expressed 
in terms of the ordinary mutual information rate if a channel satisfies the 
strong converse property and at least one of the input and output alphabets 
of the channel is a finite set. □ 



3.6 Channel Capacity with Cost Constraint 

There are some situations in the problem of channel coding such that cost of 
codewords must be taken into account. This section is devoted to a general 
theory on channel coding problems with cost constraint. 

For n = 1, 2, • • • fix a mapping Cn : > R arbitrarily and set 



where R denotes the set of all real numbers. For x G we call c^(x) the cost 

of X and — Cn(x) the cost per symbol of x. Hereafter, we call c in (3.6.1) the 

cost function. In the channel coding problem with cost constraint we require 
that codewords belonging to a code 



for all n = 1, 2, • • •, where P is an arbitrarily given constant. We call a con- 
straint in the form of (3.6.2) the cost constraint P (from this point of view 
the channel coding problems treated so far can be regarded as special cases 
where c^(x) = n and P = \). 




(3.6.1) 



Cn = (Uj € A'") 



satisfy 




(3.6.2) 



226 3 Channel Coding 



The channel coding problem with cost constraint can be formulated in the 
same way as the channel coding problem without cost constraint. Denoting 
by 

(??,, Mfij Srii r) 

an (n, Mn, ^n)-code satisfying the cost constraint T, we replace Defini- 
tions 3.1.1- 3.1.2 in §3.1 with the following definitions: 

Definition 3.6.1. 

Rate R is T-achievable ^4^ There exists an (n, T')-code satisfying 

lim Sn = 0 and lim inf — log Mn > R. 

n-^oo n—^oo fl 



Definition 3.6.2 (T-cost channel capacity). 

Cs(r|W) = sup {R I R is T-achievable} . 



In order to develop a formula of the T-cost channel capacity we set 

^””(0 = |x€ A’”|lc„(x) <r| 

and denote by <Sr all the input processes X == satisfying 

Pr{X^ G X^{r)] = 1 

for all n = 1,2, • • •. Then, we have the following theorem corresponding to 
Theorem 3.2.1. 

Theorem 3.6.1. The F-cost channel capacity Cs{F\W) of a channel W is 
given by 

Cs{F\W)= sup /(X;Y), 
xe5r 

where Y denotes the output process of the channel W with an input process 

X. 

Proof We can prove this theorem by following the same lines of proof of 
Theorem 3.2.1 with restricting the general input process X to X G <S/-. Note 
that X'^ appearing in Lemma 3.2.2 satisfies 

X = , e 5r. 

Remark 3.6.1. Theorem 3.6.1 does not hold in general if we modify the defi- 
nition of to all the input processes X = satisfying — Ecn(X^) < F 

for all n = 1, 2, • • •. One of the typical settings in which Theorem 3.6.1 holds 
under this modified definition of Sp is considering stationary memoryless 
channels with the additive cost, which is given below. □ 



3.6 Channel Capacity with Cost Constraint 



227 



Let us consider here a special case where the cost function is expressed as 

n 

Cn{x) ='^c{Xi) (x= (xi,X2,---,Xn)) 

i=l 

by using a function c : A' — > R. Such a cost function is called additive. 
Under this additive cost we have the following theorem corresponding to 
Theorem 3.1.1 as a special case of Theorem 3.6.1. 

Theorem 3 . 6 . 2 . For arbitrary input and output alphabets X and y, respec- 
tively, the r -cost channel capacity Cs{F\W) of a stationary memoryless chan- 
nel W = {W} with additive cost constraint is given by 

Cs{F\W)= sup I{X-Y), (3.6.3) 

X:Ec{X)<r 

where Y denotes the output of the channel W due to the input X. 

Remark 3 . 6 . 2 . The right-hand side of (3.6.3) is concave and therefore con- 
tinuous as a function of F. This can be verified similarly to Remark 5.2.1 in 
Chapter 5. Notice that in Remark 5.2.1 the role of X is changed to the role 
of Y and min is used instead of sup. In addition, note that W = Py\x is 
fixed here while Px is fixed in Remark 5.2.1. □ 



Proof of Theorem 3.6.2. 

The proof given below is completely different from the standard proof (cf. 
Csiszar and Korner [19]) for the case that X and y are finite sets. We start 
from Theorem 3.6.1 and use the following argument most of which was used 
when we obtained Theorem 3.2.2 from Theorem 3.2.1 in §3.2. 

1) First, for an arbitrary X. e Sp denote by Y the output from the channel 
with X as an input. Then, owing to (3.2.24) in the proof of Theorem 3.2.2 
we have 

/(X; Y) < liminf 1 /(xf ^ (3.6.4) 

n—^oo Ti ‘ 

i=\ 

where 

x = = 

Since X G Sp and the cost is assumed to be additive, we have 

i=l 

If we introduce the random variable satisfying 



(3.6.5) 



228 3 Channel Coding 



Pr{gW=i} = i {i = 1,2, ■■■,%) 

as was defined for a mixed channel in §3.3 and define and by 
X{r^)Y^ri) ^ given g(") = i, 

(3.6.4) and (3.6.5) can be written as 

J(X; Y) < liminf/(X(”^y(")|g(")), (3.6.6) 

n— »oo 

Ec(X(">) < r, (3.6.7) 

respectively. By noticing forms a Markov chain, it holds 

that 

7(x(");y(”)|q(")) < 7(x(”)g(”^y(")) = /(x(”);y(”^), 

which, together with (3.6.6), implies 

I(X;Y) <liminfJ(X(");y(”)). (3.6.8) 

^ n— >oo 

Consequently, in view of (3.6.7) and (3.6.8) we obtain 

7(X;Y)< sup I{X]Y). 
x-.Ec{x)<r 

By recalling that X £ Sp is arbitrary, it follows that 

sup/(X;Y)< sup I{X-Y). (3.6.9) 

XeSr X:Ec{X)<r 

2) Next, we develop the inequality in the opposite direction. Letting 5 > 0 
be an arbitrarily small constant, define X and Y as X and Y attaining the 
supremum of 

sup /(X;Y), 

X:Ec{X)<r-6 

respectively. Let 

X" = (Xi,X2, • • • ,Xt,), F" = (Fi,F2, ■ • • ,Ft,) 

be stationary independent sequences subject to and Py, respectively, and 
set X = and Y = Then, from Khintchin’s law of large 

numbers it holds that 

7(X;Y)= sup 7(X;Y). (3.6.10) 

X:Ec{X)<r-5 

Notice here that X G <Sr is not guaranteed. We construct Xi satisfying Xi G 
Sp from X in the following way. First, we note that c{Xi) (i = 1, 2, • • • , n) 
are independent and identically distributed. By taking 



3.6 Channel Capacity with Cost Constraint 



229 



1 "" — 

~y"Ec{Xi) <r-s 

n 



i=l 



into consideration, we obtain 

Pr I i c{Xi) asn-^oo 

I ^ i 



due to Khinchin’s law of large numbers again. Hence, setting 

A-n 

it follows that 






7 ^ = Pj^r^{An) — > 1 as n ^ oo. (3.6.11) 

If we define the input process Xi = by 

— Py^(x) for xeAn, 

In 

0 for X ^ An, 

clearly Xi G Sr^ Now, define Yi = as the output of the channel 

W = corresponding to the input Xi == {Xf Then, for any 

y G 3^^ it holds that 




^F"(y)= E 

> ^ W^"(y|x)P;^n(x) 

xGA^, 

= 7n E W"”(y|x)Pxj‘(x) 

xGAri 

= 7nPyi"(y)- 

Therefore, 



1 . 1 / 1 , 1 

r log n ^ r log ■ 



1 . 1 
+ - log — . 



n Py-iXi) n Py^{Y{^) n 7 „ 
Then, it follows that 



Pr 
> Pr 



r 1 , w^(YP\x?) 

{ - log ',L, < a 

In 



} 






Py-{yr) 

W'^(YP\X?) 1 , 1 

log —P, <oc log — 



n Py..(Tf) n 7 „ 

for an arbitrary a with a < 7(X; Y). We notice here that 



(3.6.12) 



(3.6.13) 



230 



3 Channel Coding 






> 7nPr 



{s 



log „ '.Ls < a 



Py<yr) 

Then, substitution of this inequality into (3.6.13) yields 






> 7nPr 



(s 



log ^ < a 



Pyn(y») 

By noticing a < £{X.', Y) and ^ 1 as n 



--log—}. 
n 7„ J 



oo, we obtain 



-- ’n Py{'[Y{^) n 7 n/ 



(3.6.14) 



Since a is arbitrary as far as it satisfies a < /(X; Y), (3.6.14) implies 
/(X;Y)</(Xi;Yi), 
which, together with (3.6.10), yields 



sup /(X;y)</(Xi;Yi). 

X:Ec(X)<r-5 



We recall here that Xi G Sp is clearly satisfied from the definition of Xi. 
Consequently, 

sup /(X;Y) < sup /(X;Y). 

X:Ec(X)<r-(5 XGvSr 

We note that the right-hand side is concave and therefore continuous with 
respect to 5. Thus, we obtain 

sup I{X]Y)< sup /(X;Y). (3.6.15) 

X:Ec(X)<r XeSr 



by letting (5-^0. 

3) Finally, the combination of (3.6.15) with (3.6.9) yields 
sup I{X\Y) — sup /(X;Y). 

X:Ec(X)<r XE«Sr 

That is. Theorem 3.6.2 is obtained from Theorem 3.6.1. □ 



We can obtain the following theorem on the T-cost channel capacity from 
Lemma 3.3.1 and Theorem 3.6.1. This theorem corresponds to Theorem 3.3.1. 



3.6 Channel Capacity with Cost Constraint 



231 



Theorem 3.6.3. The F -cost channel capacity Cs{F \ W) of the mixed chan- 
nel W is given by 

Cs{r\W)= sup min(/(X;Yi),/(X;Y 2 )). (3.6.16) 

Xe«Sr 

From consideration of the argument on cost constraint when we obtained 
Theorem 3.6.2 from Theorem 3.6.1, we can obtain the following theorem, 
which corresponds to Theorem 3.3.2, from Theorem 3.6.3; this is similar to 
the method yielding Theorem 3.3.2 from Theorem 3.3.1. 

Theorem 3.6.4. For arbitrary input and output alphabets X and y, respec- 
tively, the F-cost channel capacity of the mixed channel W of stationary 
memoryless channels Wi = {^i} o^'^d W 2 = {^ 2 } '^^th additive cost con- 
straint is given by 

Cs{F\Wi,W 2 ) = sup min(/(X;Yi),/(X;Y 2 )), 

X:Ec{X)<r 

where Y\ and Y 2 denote the outputs of the channels Wi and W 2 with X as 
the inputs, respectively. □ 

In particular, if an input alphabet X and an output alphabet y are finite 
sets. Theorem 3.6.4 can be generalized as follows: 

Theorem 3.6.5. Suppose that X and y are finite input and output alpha- 
bets, respectively. If eachWQ = {We} is a stationary memoryless channel, the 
F-cost channel capacity C5(T|W) with additive cost constraint of the mixed 
channel W defined in (3.3.36) is given by 

C 5 (jT|W) = sup u;-ess.inf/(X; Yei), (3.6.17) 

X:Ec{X)<r 

where Yq denotes the output of the channel We with X as the input. 

Proof. We can prove this theorem by using Theorem 3.6.1. Actually, we apply 
the arguments used in the proof of Theorem 3.3.7, taking the argument given 
in the proof 2) of Theorem 3.6.2 into account. We notice here that (i) the 
right-hand side of (3.6.17) is concave and therefore continuous with respect 
to F, (ii) we arbitrarily choose the input process X appearing in the proof 
2) of Theorem 3.3.7 satisfying X G Sr, and (iii) the random variable 
satisfies E(X^^^) < T as a consequence of (ii). □ 

We conclude this section by considering the ^-channel capacity with cost 
constraint. To this end, for an arbitrary fixed 0 < e < 1 define: 

Definition 3.6.3. 

Rate R is (£, T)- achievable There exists an {n, Mn, Sn, F)-code 

satisfying lim sup Sn < s and 

n— >00 

lim inf — log Mn > R. 

n-^00 u 



232 3 Channel Coding 

Definition 3.6.4 ( (e,r')-cost channel capacity). 

Cs{e, r'|W) = sup {R\ R is {e, T)- achievable} . 



Letting J(jR|X) be the function given in Definition 3.4.3, we have the follow- 
ing theorem corresponding to Theorem 3.4.1. 

Theorem 3.6.6. The {e^r)-cost channel capacity Cs{e, r\W) of a channel 
W is given by 

Cs{e, r|W) = sup sup {R I J{R\X) < e} (0 < Ve < 1). (3.6.18) 

In addition, the right-hand side of (3.6.18) is right- continuous and monotone 
increasing as a function of e. 

Proof. We can prove this theorem by following the same lines in the proof of 
Theorem 3.4.1 except for the choice of the input X. We need to restrict X 
to X e Sr- Notice that, as can be seen from the proof of Theorem 3.4.1, the 
(n, Mn^Sn)-code can be replaced with the (n, Sn, T)-code in Lemma 3.4.1 
ifX = {X^}^^iG<Sr. □ 



3.7 Strong Converse Property of Channel with Cost 
Constraint 

In Theorem 3.5.1 in §3.5 we gave a necessary and sufficient condition under 
which a channel satisfies the strong converse property. In this section we give 
a necessary and sufficient condition on the channel with cost constraint to 
satisfy the strong converse property. By using the obtained result, we also 
observe three examples of channels satisfying the strong converse property. 

First, we define the strong converse property of a channel with cost con- 
straint similarly to Definition 3.5.1 as follows: 

Definition 3.7.1. Let Cs(T|W) be the F-cost channel capacity of a channel 
W. If for any R satisfying R > Cs(F|W), all the (n, Mn, F)-codes with 

lim inf — log > R 

n^oo 77, 

satisfy 

lim En = 1, 

n-^oo 



the channel W is said to satisfy the strong converse property under cost con- 
straint r. □ 



3.7 Strong Converse Property of Channel with Cost Constraint 



233 



The following theorem holds under this definition. 

Theorem 3.7.1. A channel W satisfies the strong converse property under 
cost constraint F if and only if 

sup /(X; Y) - sup 7(X; Y). (3.7.1) 

Xe5r Xe<Sr 

Proof. This theorem can be proved in the same way as Theorem 3.5.1 except 
for the restricting general input process X to X G <Sr* D 



Remark 3.7.1. Theorem 3.7.1 tells us that, if a channel W satisfies the 
strong converse property under cost constraint T, the (e, T)-cost channel 
capacity Cs{s,r\W) defined in Definition 3.6.4 is given by 

Cs{e, r\W) = Cs{r\W) (O < Ve < 1), (3.7.2) 

which does not depend on e. However, note that (3.7.2) does not always im- 
ply the strong converse property the under cost constraint F. □ 



Corollary 3.5.1 can be generalized for the channels with cost constraint 
in the following way; 

Corollary 3.7.1. Suppose that at least one of an input alphabet X and an 
output alphabet y of a channel W is a finite set If W satisfies the strong 
converse property under cost constraint F , then it holds that 

C,{r\W)= sup i(X;Y)= sup 7(X;Y) = lim - sup 

XeSr X.eSr ^ X"6'P„(r) 

where Vn{r) denotes the set of all input variables X" satisfying 
Pr|lc„(X")<r| = l 

and Y = denotes the channel output corresponding to 'K = 

S vn\oo 

Jn=l* 

Proof. This theorem is obvious from Theorem 3.5.2, Theorem 3.6.1, Theo- 
rem 3.7.1 and the definition o^ Sp- Cl 

Now we consider two examples as application of Theorem 3.7.1. First, we 
have the following theorem that is used in §6.6 in Chapter 6 as well. 

Theorem 3.7.2. If X and y are finite input and output alphabets, respec- 
tively, then a stationary memoryless channel W == {lY} 'with additive cost 
constraint satisfies the strong converse property for each cost constraint F. 



234 3 Channel Coding 



Remark 3.7.2. Under the additive cost c{x) = 1 (Vx G A') with F = 1 
(i.e., for the case of no cost constraint), Theorem 3.7.2 was first proved by 
Wolfowitz [100] and then by Gallager [30], Arimoto [5] and others. □ 



Proof of Theorem 3.7.2. 

By taking the proof 2) of Theorem 3.6.2 in the preceding section into 
account, in view of Theorem 3.7.1 it suffices to prove 

sup7(X;Y)< sup /(X;y) 

X€5r V:Ec(X)<r 

- max I(X',Y) = Csir\W) (3.7.3) 

X:Ec(X)<r 

in order to develop the strong converse property. We prove (3.7.3) hereafter. 
Denoting by X and Y the optimal X and Y attaining the maximum in (3.7.3), 
respectively, we note that the Kuhn- Tucker theorem and Theorem 3.6.2 yield 
the following lemma: 

Lemma 3.7.1. If X andy are finite input and output alphabets, respectively, 
then for all x e X we have 

D{W{^\x)\\Py) = Ao(c(x) - r) + Cs{P\W) - \{x) (3.7.4) 

for some Aq > 0 and \{x) > 0 (x G X), where P > minc(x) is assumed. □ 



Now, let X = be an arbitrary input satisfying X G and 

denote by Y = the output of the stationary memoryless channel 

W = {W} with X as an input. Setting 



' ' ' n ® Py-CF") ' 

it clearly holds that 



(3.7.5) 



y") = 1 log - 1 log 



where Py" (y) ^ (l/ii 2/2, ' • ■ , 2/n))- Since Lemma 3.2.1 

i=l 

guarantees 

1 Pvrr(Y^) 

p- lim inf - log , . > 0, 

n->oo n Pyrt(Y^) 

it follows that 

p-limsupi(X -Y )<p-hmsup-log — 

n^oo 71^00 n rynyi j 

1 Py-(Y^) 

— p- lim inf — log — — . 

n-KX) n Py"(y") 

1 , iy”(y«|x") 

< p- lim sup — log — 7 — tt; . 

“ n-oo n ® Py"(F") 



(3.7.6) 



3.7 Strong Converse Property of Channel with Cost Constraint 



235 



On the other hand, setting X'^ = , X ^^^ , • • • , xi^^) and , ¥ 2 ^^ , 

• ■ • , for each block length n, it holds that 



1 W^{Y^\X^) 

n Py"(y") 



ly'l 

n U PyiYt^) 



(3.7.7) 



We now fix an x = (xi, 0:2, • • • , a:„) 6 A"" arbitrarily and define 



I{Yl^^\xi) = log 



Py{Yt^) ■ 



(3.7.8) 



Then, \xi) (z = 1, 2, • • • , n) are independent under the conditional prob- 

ability distribution kP^(-|x) of Y^ given X'^ = x owing to the assumption 
that the channel is memoryless. In addition, while for each z = 1, 2, • • • , n the 
conditional expectation of I{Y^^^\xi) given X'^ == x is written as 



E> 



[/(r; 



(^)l 



Xi) = D{W{-\Xi)\\Py), 



in view of Lemma 3.7.1 this can be expressed as 



Ex 




- Ao(c(xO - r) + Cs{r\w) - \{xi). 



If we restrict x = (xi, X2, • • ■ , Xn) C to x meeting the constraint 



1 

n 



^ c{xi) < r, 

i=l 



(3.7.9) 



we obtain 



E. 






i=l 



<Cs{r\w) 



because Aq > 0 and A(x^) > 0. We note here that there is no (x,y) G A x 
y satisfying W{y\x) > 0 and Py{y) = 0 because (3.7.4) guarantees that 
D{W{-\x)\\Py) is bounded for all x G A. Hence, the variance af of each 

7(Y/^^|xi) under the conditional probability distribution VY’^(-lx) depends on 
neither z nor Xi and is uniformly bounded. That is, cr| < ctq (z = 1, 2, • • • , n) 
for some ctq. Therefore, Chebyshev’s inequality tells us that 



Pr 



I, t=l 



Xi)>Cs{P\W) + j\X^ = x\ <^, 






where 7 > 0 is an arbitrary constant. While this inequality is valid for all x 
satisfying (3.7.9), we note that this inequality is valid for all realizations x of 
X'^ because X = C Sr is assumed. Thus, it holds that 



Pr 



lf^/(7”)|x7)>c',(r|w) + 7 




236 3 Channel Coding 



which can be written as 

from (3.7.7) and (3.7.8). Consequently, 

i ■»* ^Z'iyV s + ^} = »■ 

By recalling that 7 > 0 is arbitrary, we obtain 
p-limsup-log-^^-^ < Cs{r\W), 

n—^oo If' ) 

which, together with (3.7.5) and (3.7.6), leads to 

n^oo n PYr,\Y^) 



i.e., 



7(X;Y) <C,(r|W). 

Since X is arbitrary as far as it satisfies X G we have 

sup 7(X;Y) <c,(rii^). 
x.esr 

This establishes (3.7.3). 



(3.7.10) 

□ 



Remark 3.7.3. The definition of the strong converse property (Defini- 
tion 3.7.1) and Theorem 3.7.2 imply that the ^-channel capacity (see §3.4) 
of a stationary memoryless channel with additive cost constraint becomes a 
constant independent of 0 < £ < 1 for each cost constraint F if both of input 
and output alphabets are finite sets. □ 



Next, let us consider an example of channels whose input and output 
alphabets are not finite sets. One of the most important and well-known of 
such channels is the additive white Gaussian noise channel (AWGN channel). 
That is, letting X and y be an input alphabet and an output alphabet 
respectively, we call the stationary memoryless channel with X = y = 'R. 
and the probability transition density W \ X —^y given by 



W{y\x) 



1 _ 

P 2N 

\/27rY 



(3.7.11) 



for some Y > 0 the AWGN channel. This W{y\x) is regarded as the Gaus- 
sian probability density function with mean x and variance N if we fix x 
and view W{y\x) as a function of y. This channel can be interpreted in the 



3.7 Strong Converse Property of Channel with Cost Constraint 



237 



following way. Letting X be an input variable over A', the output variable Y 
is determined by 

(3.7.12) 

where Z is the Gaussian random variable with zero mean and variance N 
independent of X (i.e., noise). In this sense N is called the noise power. It is 
usual that we consider the additive cost defined by 

n 

c„(x) = ^xf {x= (xi,X 2 ,- ■ ■ ,0Cn) € (3.7.13) 

i=l 

for this channel. Then, the cost constraint P can be written as 

(3-7.14) 

i=l 

Here, P is called the signal power. We have the following two theorems that 
are used in §6.7 in Chapter 6 as well. 

Theorem 3.7.3 (Shannon [77]). The P-cost channel capacity Cs{P\W) of 
the AWGN channel W = {W} under the cost constraint (3.7.14) is given by 

C,{P\W) = + (3.7.15) 

Theorem 3.7.4 (Shannon [80]). The AWGN channel W == {W} satisfies 
the strong converse property under the cost constraint (3.7.14) for each cost 
constraint P. 

Proof of Theorem 3.7.3 and Theorem 3.7.4- 
If we develop 

sup T(X; Y) < t log(l + ^) < sup i(X; Y), (3.7.16) 

yiesp ^ ^ ^ XG5p 

then in view of Theorem 3.6.1 and Theorem 3.7.1 with using P instead of 
P Theorem 3.7.3 and Theorem 3.7.4 are simultaneously established. Here- 
after, we develop (3.7.16). We prepare the following lemma before developing 
(3.7.16). 

Lemma 3.7.2. 

sup /(X;Y) = bog(l + ^). (3.7.17) 

X:E(X2)<P ^ ^ / 

Here, Y denotes the channel output with X as th input and the supremum is 
taken with respect to all random variables X satisfying E(X^) < P. 



238 3 Channel Coding 



Proof. Let X be an arbitrary random variable satisfying E(X^) < P. By 
expressing Y in the form of (3.7.12), it follows that 

I{X-, y) H{Y) - H(Y\X) 

= H{Y) - H{X + Z\X) 

= H{Y) - H{Z) 

= H{Y)-^\og{2TreN), (3.7.18) 

where the entropy of a continuous random variable U is defined by 

H{U) = - J f (u) log f{u)du 

and /(u) is the probability density function of U. (The quantity on the right- 
hand side is called the differential entropy of U . The conditional entropy 
H{U\S) is defined in the same way by using the conditional probability den- 
sity function.) We note here that, since Z is independent of X, the variance 
of Y can be expressed as 

V{Y)^V{X) + V{Z), 

where V{X) and V{Z) are variances of X and Z, respectively. Since we have 
V{X) < P and V{Z) — N, it follows that 

V{Y) < P + X. (3.7.19) 

We recall here the fact that H(Y) is maximized subject to the constraint 
(3.7.19) if T is the Gaussian random variable with zero mean and variance P-h 
N (the maximum entropy theorem: cf. Cover and Thomas [17]). This fact can 
be verified in the following way. Let g{y) be the Gaussian probability density 
function with zero mean and variance and f{y) an arbitrary probability 
density function with zero mean and variance cr^. We can prove that the 
divergence for continuous probability density functions is nonnegative, i.e., 

D{f\\g) = J f{y)log^^dy>0 

in the same way developing nonnegativity of the divergence for discrete prob- 
ability distributions given in Chapter 1. From the inequality above, we im- 
mediately obtain 

i log(27recr2) >~J f{y)^ogf{y)dy. 

Hence, the optimal Y = Y maximizing H{Y) is realized by X if and only if 
X is equal to the Gaussian random variable X with zero mean and variance 

P. Clearly, we have H{Y) = ^ log(27re(P + N)). Then, (3.7.18) implies 



3.7 Strong Converse Property of Channel with Cost Constraint 



239 




which completes the proof of the lemma. 



□ 



Remark 3.7.4. If only Theorem 3.7.3 is of interest, Theorem 3.7.3 follows 



Now, we are ready to prove Theorem 3.7.3 and Theorem 3.7.4. 

1) The right inequality in (3.7.16) can be proved by the argument using 
Lemma 3.7.2 that parallels the proof 2) of Theorem 3.6.2. 

2) Next, we prove the left inequality in (3.7.16). Though the proof of this 
part basically parallels the proof of Theorem 3.7.2, we need to note that a 
property corresponding to Lemma 3.7.1 does not hold for the channel with 
continuous input and output alphabets. Hence, under the condition that an 
input sequence x = (xi,X 2 , • • • ,x^) satisfies the cost constraint (3.7.14), we 
prove 



conditional probability density VP’^(-|x) of by direct computation. Since 
the proof of Theorem 3.7.2 is still valid for the case of continuous input and 



from Theorem 3.6.2 and Lemma 3.7.2. 



□ 




(3.7.20) 



and evaluate the variance of /(T/’^^jxi) given = x with respect to the 



output alphabets, by replacing F and c{x) with P and respectively, we 
can obtain 




(3.7.21) 



instead of (3.7.3). 



Hereafter, we give the direct computation. Recall that is de- 

fined as 




(n) 



(3.7.22) 



in (3.7.8), where 




By using (3.7.11), we have 



240 3 Channel Coding 



log 



W{y\x) 1 

Pyiy) 2 






+ 



y'^ _ {y - x)'^ 

2 (P + N) 2N 



Therefore, we obtain 



E. 



log- 






J-oo 



Pyiy) 

x^-P 

2(P + AT)' 

Since x = (^i, ^ 2 ? • • • ? ^n) C is assumed to satisfy the cost constraint 
(3.7.14), it follows that 

W{Y^'^\xi) 



E. 



i^log. 

n ^ 



i=l 



PyiYi 









(3.7.23) 



which is exactly (3.7.20). 

Next, we evaluate the variance of /(y/^^jx^) with respect to the condi- 
tional probability density iy^(-|x) for a fixed x G X'^. From direct computa- 
tion this variance is given by 

9P2 xfAT 

4(P + A/')2 (P + 7V)2' 

By noting that each 7(1^^"^ |xj) is independent with respect to PE”(-|x) for a 
fixed X" = X, it follows that 



14 



log 



W{yI^^\x,) 



PyiXt^) 



14 



1 ^ 



(n) 



Xi) 



i=l 



4n(P + X)2 n2(P + X)2 

^ 9P2 NP 

- 4n(P 4-77)2 + n(P + 77)2’ 



(3.7.24) 



where the inequality follows because x satisfies the cost constraint (3.7.14). 
By applying Chebyshev’s inequality together with (3.7.23) and (3.7.24), we 
can obtain the inequality (3.7.21) in the same way developing (3.7.10) in the 
proof of Theorem 3.7.2. □ 



We may generalize Theorem 3.7.3 to the case of more general Gaussian 
channels, i.e., the stationary additive but non-white Gaussian noise channels 
(called the ANWGN channels) with input and output alphabets X = y = H 
defined as follows. Let Z = (Zi, Z 25 * ■ •) the stationary non- white Gaussian 

noise process with mean zero. Then, the ANWGN channel is specified by 

yin) ^ ^(n) ^ (i = 1 , 2 , . • • , n), (3.7.25) 

where is the i-th channel input and is the corresponding chan- 
nel output, and (Zi, Z 2 , • • • , Z^) is independent from {x[^\ • • • , X^^) 



3.7 Strong Converse Property of Channel with Cost Constraint 



241 



(additivity). We assume that the noise process Z = (Zi, Z2, • • •) is purely non- 
deterministic (cf. Ihara [53]). Let this channel be denoted by W = {W^ : 
-4 We have the following theorem for the ANWGN channels. 

Theorem 3.7.5. The ANWGN channel W = {^} satisfies the strong con- 
verse property under the cost constraint (3.7.14) for each P. □ 



Theorem 3.7.5 can be proved in three steps as follows. 



Proof. 

Step 1: Define the autocorrelations {"fk}k=^oo noise process Z by 

- E(ZiZi^k) (k = 0, 1, 2, . . •), (3.7.26) 

lk = l-k, (fc = -1,-2, •••), (3.7.27) 

where it should be noted that the right-hand side of (3.7.26) does not depend 
on i because of the stationarity of the noise process. By using these 7/c’s we 
define the n-dimensional covariance matrix Vn by 



7o 


71 


72 • 


• • 7n-l \ 


7-1 


7o 


7i 


' ■ 7n— 2 


7-2 


7-1 


7o 


• • 7n-3 


\7-{n-l) 7- 


-(n-2) 7-(n-3) ' 


•• 70 / 



With this Vn, the transition probability density of the channel W'^ is given 
by 



W^(y\x) = 



7 (27t)" det K 



exp 






(3.7.29) 



where x G and y G are an input and the corresponding output, 
respectively. 

We now transform the channel (3.7.25) to an equivalent form as follows. 
We first note that, since W is a symmetric positive-definite matrix, there 
exists an n-dimensional orthogonal matrix Un satisfying 



U:^VnUr, = 





0 ^ 




1 0 





(3.7.30) 



where (i = 1, 2, • • • , n) are the eigenvalues of V^. We notice here that 
> 0 (i = 1, 2, • • • , n) because the noise process Z is purely nondetermin- 
istic. Define the modified noise process (Z^^\z^ 2 \ ' i by 



242 3 Channel Coding 



• • • > Z^n) = (^1. ^2, ■ • • , ^n)C^n, 



(3.7.31) 



(ti) 

where it is evident that , Z 2 , • • • , are Gaussian and mutually inde- 
pendent with means zero and variances for i == 1, 2, • • ■ , n. 

Accordingly, if we define the random variables ,^2 ) and 



• • • ,Fi”^) = 



(3.7.32) 

(3.7.33) 



the channel as specified by (3.7.25) is equivalently transformed to the non- 
stationary but memoryless additive Gaussian channel specified by 






{i = 1,2, •••,n). 



(3.7.34) 



Let this channel be denoted by W = {W : We notice here 

that under this transformation we have 

(x[”V + (4"^)2 + ... + (xi”^)2 

= (X^”^)2 -f (X^"^)2 -f- ■ • ■ -p (X^"))^ 

Therefore, setting 

c„(x) =xl + X 2 + --- + xl (x = (a;i,X 2 ,-- - ,x„) € X"), 

x" = (x'”\x'"\---,x(")). 



(3.7.35) 






J j’ 



and noting that the transform given in (3.7.32) and (3.7.33) preserves the 
mutual information, we have 



1 



max I{X^;Y^) 



1 



max 









7(X";y”), (3.7.36) 



for P > 0, where and Y^ are the channel outputs of the channels 
and W corresponding to the inputs and X , respectively. On the other 
hand, it is well-known that the right-hand side of (3.7.36) is explicitly written 
by using the technique of water-filling (e.g., see Gallager [30]) as 



— _ ma:^ I[X ;r ) 



1 ^ / pin) \ 



where 

pin) 



max 



[^W_^|n)^ 0 ] (i = l,2,---,n). 



(3.7.38) 



and > 0 is specified by the equation 



3.7 Strong Converse Property of Channel with Cost Constraint 



243 



n 

Y, Pi’^^ = nP. (3.7.39) 

2=1 

Here, it is easy to see from (3.7.38) and (3.7.39) that 

A^p'^ > P. (3.7.40) 

Step 2: We now recall the following lemma: 



Lemma 3.7.3 (e.g., Ihara[53]). The P-cost capacity C5(P|W) of the AN- 
WGN channel W with the cost constraint (3.7.14) is given by 



CJP\W)= lim - max 

n->oo n X^:j^Ecr3X^)<P 



2=1 \ 



N. 



yin) ' 
in) 



(3.7.41) 



Remark 3.7.5. It is also possible to give the nonlimiting formula for the 
right-hand side of (3.7.41). To this end, let us define the spectral density 
function ^(A) of the noise process Z = (Zi, Z2, • • •) by 

.. 00 

5(A) = Y (-7T < A < 7 t), (3.7.42) 

k= — oo 

where jk are the autocorrelations as defined by (3.7.26) and (3.7.27). Then, 
Lemma 3.7.3 can be rewritten as follows: 



Lemma 3.7.4 (e.g., Ihara[53]). The P-cost capacity Cs(P|W) of the AN- 
WGN channel W with the cost constraint (3.7.14) 'Is given by 



C,(P|W) = 2^f log (1 + ® ) dx, (3.7.43) 

where 

/(A) = max[ap — g{\), 0 ] (— tt < A < tt), (3.7.44) 

and ap > 0 is specified by the equation 

r f{X)dX = P. (3.7.45) 

J —TT 

This formula will be used in Chapter 6. □ 



Step 3: Now, we are ready to complete the proof of Theorem 3.7.5. In view 
of (3.7.35) and the equivalence of the channels W and W, it suffices to show 
the strong converse property of the channel W under the cost constraint 
(3.7.14). To this end, in the light of Theorem 3.6.1 and Theorem 3.7.1 with 
P instead of P, it suffices to show 



244 3 Channel Coding 



_sup /(X;Y) <C,(P|W), 

XG<Sp 

or equivalently (cf. Lemma 3.7.3), 



sup 

XG.Sp 



1 / pM \ 

a; + 4 ^ j- 



(3.7.46) 



where X denotes an input process for the channel W and Y denotes the 
channel output process of W with an input X. First, let X — {X be 
an arbitrary input such that X G «Sp and Y = {Y be the corresponding 
output via W — {W due to the input X. For simplicity, set 



1 , Tr(F"lx'^) 

^ ^ n ^ Pyr.{Y^) 

which we transform as 



1 , Tr^(Y"|X'0 

i(X :Y ) = - log ^ — ' 

^ ^ Py^Y") 



^ Py^y ) 



(3.7.47) 



(3.7.48) 



where 



^y-(y) = li-Pyi(yi) (y = iyi,y2,---,yn) e y"), 



(3.7.49) 



— (ti) 

and Py. is the probability density of the output Y ^ via the z-th component 

— (ti) 

Gaussian channel in (3.7.34) due to the input X^ that attains the maximum 

of the mutual information /(X-’^^; Y-’^^) under the condition E(X^^^)^ < 

(i = 1, 2, • • • , n). Specifically, for i = 1, 2, • • • , n, 

.,2 

1 



PySy) = 



It is easy to check that 






> 0 , 



n->oo n ^ Py^(Y^) 
and hence, from (3.7.48), 



, 1. li W (>^ 1^ ) 

p- lim sup i{X ; Y ) < p- lim sup — log ^ 

n^OO n— >00 Ti Pyn ( Y ) 

,. . , 1 , Py-{Y") 

- p- hm mf - log — — 

"-00 n ®Pp„(F^) 

n— >00 PyniY ) 



(3.7.50) 



(3.7.51) 



3.7 Strong Converse Property of Channel with Cost Constraint 



245 



On the other hand, setting 

A — VA 1 , A2 , • ' ■ 5 j, 

and noting (3.7.49) as well as the fact that the channel is memory less, 

we have 



1 , w^(X\x) 1 

n Py,.{Y^) ~ ^ 



= - ^ log 

n. ^ ' 



?(^)\ 



i=i 0 

where, in view of (3.7.34), for z = 1, 2, • • • , n, 

1 






Wi{y\x) - 









27tM 



(n) 



{x€X,y€y). 



Now, fix any realization x == (xi, a; 2 , • • • , x„) of X" and set 

/(F<"V0 = 10. 






(3.7.52) 



(3.7.53) 



(3.7.54) 





Then, /(T] \xi) (i = 1, 2, • • • , n) are independent under the conditional dis- 
tribution W (-|x) given X = x. because the channel W is memoryless. 
Then, it follows from (3.7.50) and (3.7.53) that 






(n) ' 



P%{y) 

Therefore, 



N 



(n) 






(y - xf 



2{pI^^ + iv/"^) 



2N, 



(n) • 



E. 



log 



f 



Wj{Yf\x,) 

Py,X"^) 



JT7 / \ M ^i(y\Xi) , 
Wi{y\xi) log ^j- 73 -dy 



PyXy) 



IIP 

olog 1+ ’ 



(n) 



!)(^) 






(n) 



+ 



2{Pt^ + Nl^>) 



(n)\ 



where Ex denotes the conditional expectation under the conditional distri- 
bution W (-|x). Hence, 



E. 



1 

-^log 

n. ^ ' 



i=l 



Pyiy^X^) 



-1 ^ / p(^) \ ^ 



i?-p; 



(n) 



2n(r;^"^ + Xf >) 



246 3 Channel Coding 






5(^) ' 



e: 



nP 



N 



2nA 



(n) 



where > 0 is as specified by (3.7.38) and (3.7.39) and we have taken 
account of the fact that > 0 (i.e., Ap ^ implies P/^^ H- = 

A^p^ and P/’^^ = 0 implies > A^p ^ . Then, in view of the cost constraint 
(3.7.14), we have 



E. 



PyVt') 



1 ^ / pM ' 



(3.7.55) 



On the other hand, direct computation shows that the variance of I{Y\ \xi) 
under the conditional distribution W (-|x) is given by 



V. 



log 



Py^yf) 



%pn 



W\2 



4 (^(n) ^ ^( 0)2 



4- 







(3.7.56) 



where we have used (3.7.38) and 

(3.7.40)). Since HY\ \xi) {% = 1,2, • • • ,n) are mutually independent under 
the conditional distribution W (-|x), it follows from (3.7.56) that 



Vx 



1 ^ 

- E >»8 



i=\ 






< A + ELifi 

4n v?P 
4n n 



(3.7.57) 



where we have taken account of cost constraint (3.7.14). Chebyshev’s inequal- 
ity together with (3.7.55) and (3.7.57) leads to 



Pr j -f^I{Y^"^\xi)>Cn + 5 

[ ” i=i 



X' 



- n(52’ 



(3.7.58) 



where ^ > 0 is an arbitrarily small constant and, for simplicity, we set 

(3.7.59) 



1 ^ / pin) 



i=l 



N 



(n) 



3.8 Joint Source- Channel Coding 



247 



We notice here that inequality (3.7.58) holds for all realizations x of with 
X = e Sp. Therefore, 






Then, it follows from (3.7.52), (3.7.54) and (3.7.60) that 



Pr j 
Hence, 



1 ir(F"|x")^^ .1/^0 

- log — 7=n:~ ^ • 

n Pyr^iY ) J nd^ 



lim Pr 

n—^oo 



1 w(r|x)^^ 1 

~ log — >Cn-\-S 

^ Py^Y ) 



0. 



Since (5 > 0 is arbitrary, we have 

1 ^ W(F''|X^) , ^ ^ 

p- lim sup — log ^ llJ^ sup Cn = lim Cn 

n— ^•oo Pyn (Y ) n— >oo n^oo 

Thus, by (3.7.47) and (3.7.51), we have 

1 ^ w'iT'ijr) ^ ^ 

p-hmsup-log— — < lim 

n-^oo n Pyr^iy ) 

that is, 

7(X; Y) < lim 



(3.7.60) 



Since X was arbitrary as far as X G «Sp, we can conclude that 



sup /(X;Y) < lim 

XG5p 



lim 
n— >cxD 2n 



1 y \ 



(3.7.61) 



which completes the proof of the theorem. 



3.8 Joint Source- Channel Coding 

In the channel coding problems treated so far we assumed that information 
transmitted through a channel (i.e., a message) is subject to a certain prob- 
ability distribution (e.g., the uniform distribution) over a message set A4n = 
{1, 2, • • • , Mn}. However, since the objective of channel coding primarily con- 
sists in transmission of an output from a given source to a receiver through 
a channel with nearly no error, it is infeasible to assume that outputs from 



248 3 Channel Coding 



various kinds of sources are always generated subject to a specific probability 
distribution. This means that we need to change the method of channel cod- 
ing according to the probability distributions of the sources. Furthermore, for 
sources with countably infinite source alphabets we cannot identify a finite 
message set as the source alphabets of infinite size. 

One of the reasons why we have so far assumed a finite message set (sub- 
ject to the uniform distribution, for example) lies in the assumption of two- 
stage coding. That is, we have implicitly assumed that an output from a 
source is first encoded into an element m belonging to a message set Mn by 
the fixed-length coding given in Chapter 1 {source coding)^ and then such an 
m e Mn is encoded into a channel input by the channel coding described in 
this section {channel coding). In particular, if the source satisfies the strong 
converse property and we use the optimal source coding at the first stage, the 
codeword resulting from the source coding (corresponding to a message in 
the channel coding) is generated subject to a probability distribution nearly 
equal to the uniform distribution (see Theorem 2.6.4). 

Let us consider the situation in which we transmit a source output through 
a channel. We may realize the optimal information transmission by consid- 
ering source coding and channel coding separately and using the optimal 
source code and the optimal channel code in respective stages. This idea is 
called the separation principle. On the other hand, we may realize the op- 
timal information transmission by choosing the optimal encoder among all 
of the encoders that directly transform a source output into a channel input 
in a single stage. This idea is called the joint source- channel coding. Since 
under the separation principle a class of encoders that transform a source 
output into a channel input in a single stage is restricted to a class of two- 
stage encoders, it is quite natural to ask whether we can achieve the optimal 
coding performance under this restriction or not. We say that the “separa- 
tion principle holds” if we can achieve the optimal coding performance by 
the two-stage coding. For example, it is a classical result that the separation 
principle holds if a source is stationary and memoryless (or stationary and 
ergodic) and a channel is stationary and memoryless. That is, the separation 
theorem holds without exception for typical sources and channels that are 
usually treated. In fact, the separation principle enables us to separate the 
source coding from the channel coding and study the source coding and the 
channel coding as independent subjects. 

Does the separation principle hold for general sources and general channels 
treated in this book? If we jump to a conclusion, the answer to this question 
is “yes.” We will see in this section in what sense the answer is “yes.” 

First, let V be a source alphabet (a countably infinite set). Denote by 
and y an input alphabet and an output alphabet of a channel, respectively 
(A' and y can be arbitrary sets). Suppose that an arbitrary general source 
V = smd an arbitrary general channel W == are given. 

We define an encoder (pn ^ and a decoder as arbitrary 



3.8 Joint Source- Channel Coding 



249 



mappings. Setting and denoting by the output from the 

channel with X'^ as the input, 

yn yn (3 3 3^) 

clearly forms a Markov chain. We define the error probability by 

£„ = Pr{y"^V’n(y”)} 

= ^ Pvn(v)W”(P^(v)|¥^„(v)), (3.8.2) 

where for each v € V^, V{w) C denotes a disjoint decoding region for 
V. That is, £n is defined as the average error probability with respect to the 
probability distribution Pyn of the source. 

Notice here that in this kind of joint source-channel coding the notion 
of the coding rates defined in §1.1 and §3.1 no longer generally makes sense. 
This means that the infimum achievable fixed- length coding rate Rf(V) (Def- 
inition 1.1.2) and the channel capacity C(W) (Definition 3.1.2) do not make 
sense in general. What makes sense is the existence of a pair {(pn, '(pn) of an en- 
coder and a decoder making the error probability £n less than a certain value 
given in advance. For simplicity, we call a pair (<^n?'0n) of an encoder and 
a decoder with the error probability £n an (n, £n)-code. Since we are mainly 
interested in (n,£n)-codes with 0 as n — ^ oo, we give the following 

definition. 



Definition 3.8.1. 

Source V is transmissible over channel W 

-4^ There exists an (n,£n)-oode satisfying lim Sn — 0 

n— >oo 

For a pair (V, W) of a source V = and a channel W = 

we first investigate a condition under which V is transmissible over W. To 
this end, we need the following lemma, which is a generalized version of 
Lemma 3.4.1. 



Lemma 3.8.1 (Generalization of Feinstein’s lemma). Let X'^ be an 

arbitrary channel input (X^ can be arbitrarily correlated with a source output 
V^) and denote by Y'^ the output from a channel corresponding to X'^ . 
Then, there exists an (n,£n)- code for satisfying 



£n < Pr 



-log 

n 



W^(Y^\X^) 

Pyr^iY^) 



< - log 
n 



1 

PvAv^) 







(3.8.3) 



for all n = 1, 2, • • •, where ^ > 0 is an arbitrary constant. 



Remark 3.8.1. In particular, if we consider the source V = {y^}^=i sub- 
ject to the uniform distribution over a message set Mn = {1, 2, ■ ■ • , M^}, we 
have 



250 3 Channel Coding 




1 

PvAv^) 



logM„. 



That is, the entropy-spectrum of the source exactly becomes a one-point 
spectrum with a single peak of probability one. Therefore, Lemma 3.8.1 is 
reduced to Lemma 3.4.1 for this source. □ 



Proof of Lemma 3.8.1. 

We prove this lemma by using the random coding argument given in the 
proof of Theorem 3.1.1 in §3.1. Note that we need to consider a general source 
V = instead of a message set. In particular, since the channel input 

is assumed to be arbitrarily correlated with a probability distribution 
for generating a random code generally depends on a source output v G 
First, for each v G we generate x(v) G randomly subject to a con- 
ditional probability distribution Px"|y’‘(-|v) and define x(v) as a codeword 
for V. That is, we define an encoder by x(v) = (^n(v). Here, 

we suppose that {x(v) | Vv G V^} are independently generated. In order to 
define a decoder ^ V^, we set 



{(V, 



x,y) €2" 



1 , t^”(y|x) 1 , 1 

- log — — ^ > - log — — ^ 
n Py-(y) n Pyn(v) 



+ 7 



'S'n(v) = {(x,y) G A'” X :V"|(v,x,y) € 5„}, 



(3.8.4) 

(3.8.5) 



where we set Z'^ = x x for simplicity. Suppose that a decoder 
receives y G We define the decoder by v = '0n(y) if there exists a unique 
V G satisfying (x(v),y) G Sn{^)‘ If there exists no such v or exist more 
than one such v, we define '0n(y) ^ arbitrarily. Then, the error probability 
(averaged with respect to the random code) Cn caused by the pair {(pn^'f’n) 
of the encoder and the decoder is given by 



£n = X] •P\^"(v)£n(v), 

vGV'‘ 



(3.8.6) 



where £n(v) denotes the error probability (averaged with respect to the ran- 
dom code) of a source output v G We now evaluate £n(v) in the following 
way: 



£n(v) < Pr{(x(v),y”) ^ Sn{v)} 

+ Pr| U {(x(v'),y”)e5„(v')} 

<Pr{(x(v),y")^5„(v)} 

+ ^ Pr{(x(v'),y")e5„(v')}, 



(3.8.7) 



3.8 Joint Source- Channel Coding 



251 



where denotes the channel output corresponding to the input x(v). Since 
the first term on the right-hand side of (3.8.7) can be written as 

^n(v) = Pr{(x(v),y”) ^ 5'„(v)} 

= ■Px"y’MV'”(x,y|v), 

(x,y)^S„(v) 

it follows that 

^ Pvn(v)A„(v) = ^ Pk...(v) ^x-y"|y"(x,y|v) 

vGV^*- (x,y)^5„.(v) 

= ^ Py"X"y"(v,x,y) 

(v,x,y)^S'„ 

= Pr {y"X"y” ^ 5„} . (3.8.8) 

On the other hand, by noting that x(v) is independent of x(v') (v' ^ v) and 
therefore Y'^ is independent of x(v'), the second term on the right-hand side 
of (3.8.7) can be evaluated as 

S„(v)= ^ Pr{(x(vO,y”)eS„(v')} 

= Y ■Py”|y”(y|v)^x"|v»(x|v') 

v':v'#v (x,y)eS'„.(v') 

< Y ^V"|V"(y|v)Px"|y"(x|v'). 

V'GV- (x,y)G5„(v') 

Therefore, it follows that 

Y Py-(v)P„(v) 

vGV”' 

- Y Y Y JV"(v)Pyn|V„(y|v)Pxn|yn(x|v') 

vGV^^ v'GV"- (x,y)GS'r.(vO 

= Y Y ■Py"(y)-Px"|V"(x|v'). (3.8.9) 

v'GV’^'- (x,y)G5y,.(v') 

We notice here that (3.8.4) and (3.8.5) imply that 
Py-(y) < Pyn(v')TP"(y|x)e-"T' 
for (x, y) € S„(v'). Then, (3.8.9) can be written as 

Y Py"(v)P„(v) 
vGV"'- 

E PvA^^')Px’^ivA^W')W-{y\^) 

v'€V" (x,y)€Sn(v') 

<e-«7 ^ Pvn(v')Px"|y"(x|v')W^"(y|x) 

(v',x,y)62" 



(3.8.10) 



252 3 Channel Coding 



Hence, in view of (3.8.6), (3.8.8) and (3.8.10), we obtain 

^ Pvn(v)£„(v) 
vGV” 

< Pv"(v)^„(v)+ Pv"(v)P„(v) 

< Pr {V^X^Y^ ^ Sn} + 

Consequently, there must exist at least one deterministic (n,Sn)-code satis- 
fying 

< Fv{V^X^Y^ i Sn} + e-^^, 

which completes the proof of this lemma. □ 



Lemma 3.8.1 immediately leads to the following theorem: 



Theorem 3.8.1 (Direct theorem). Let V = be a source and 

W = \W^}^=i d channel If for a channel input X = {X^}^^^ (X'^ can be 
arbitrarily correlated with the source output V^) and a sequence { 7 n} satis- 
fying 



7n > O5 7n 0 and n7n ^00 as n 00 



(3.8.11) 



it holds that 



lim Pr 

n-^00 






log ! < - log 

n ^ Py 4Y^) - n ^ Pv4V^) 



+ 7n 



}=o, 



(3.8.12) 



then the source V = transmissible over the channel W = 

where Y'^ denotes the channel output corresponding to X'^. 



Proof We note that the constant 7 > 0 in Lemma 3.8.1 can be dependent 
on n. If we choose an arbitrary { 7 n} satisfying the condition (3.8.11) as 7 , 
such a { 7 n} guarantees that the second term on the right-hand side of (3.8.3) 
goes to zero as n ^ 00 . Since the right-hand side of (3.8.3) goes to zero 
as n 00 by virtue of (3.8.12), the {n,en)-code described in Lemma 3.8.1 
satisfies lim Sn = 0. □ 

n^oo 

Next, in order to establish the converse theorem we prepare the following 
lemma, which is a generalized version of Lemma 3.2.2. In this lemma, however, 
so as to make it applicable to arguments in § 6.8 in Chapter 6 we consider an 
encoder that does not uniquely determine a codeword V^n(v) for each v e 
but generates a codeword subject to a probability distribution Px-|v-(*|v) 
dependent on v. Such an encoder is called a stochastic encoder. It is clear 
that (3.8.1) holds for the stochastic encoders. 



3.8 Joint Source- Channel Coding 



253 



Lemma 3.8.2 (Generalization of Verdii— Han lemma). Let he a 

source output and a channel. Let (pn be the encoder of an {n,£n)-code 
for W^) and set Denoting by Y'^ the output from the 

channel with X'^ as the input, for all n = 1,2, •• • we have 



£n > Pr 



-log 

n 



W^{Y^\X^) 

Pyr^{y^) 



< - log 
n 



1 

Pv^^-iyr^) 




(3.8.13) 



where the encoder ipn can be stochastic and j > 0 is an arbitrary constant. 



Remark 3.8.2. In particular, if we consider the source V = sub- 
ject to the uniform distribution over a message set A4n = M^}, we 

have 



ilog 1 



n 



logMn. 



That is, the entropy-spectrum of the source exactly becomes a one-point 
spectrum with a single peak of probability one. Therefore, Lemma 3.8.2 is 
reduced to Lemma 3.2.2 for this source. □ 



Proof of Lemma 3.8.2. 

We can prove this lemma basically in the same way as Lemma 3.2.2. 
However, we need to note here that we consider a stochastic encoder and a 
general source V = instead of a message set of Mn- First, set 



Dfi 



|(v,x,y) G Z 



n 




T^"(y|x) 

Py„(y) 



< Ilog 

n 



1 

Py»(v) 




(3.8.14) 



and for each v eV^ define 



2^(v) = {y € 3^”IV’n(y) = v} , 

which is the decoding region for v. In addition, for each (v,x) eV'^x set 
P(v,x) = {y G 3^"|(v,x,y) G L„}. (3.8.15) 

By using the Markov chain (3.8.1), it follows that 
Pr{F"X”F” G Ln} 

= -Pv’‘X~y» (v, X, y) 

(v,x,y)€L„. 

^ Py"X"(v,x)lP”(B(v,x)|x) 

(v,x)GV"'x-V^'- 



^ Pvr>xAv,K)W^{B{v,K)nV^{v)\x) 

(v,x)GV-x<V- 

+ Y ^^V"X"(v,x)W”(B(v,x)nP(v)|x) 

(v,x)GV"'xA'- 



254 3 Channel Coding 



< Py"X"(v,x)W^”(P^(v)|x) 

(v,x)GV"^xA’-^ 

+ ^ Pvnx"(v,x)W""(S(v,x) nP(v)lx) 

(v,x)€V"xA'" 

= £n+ ^ Pvnx-(v,x)VF"(S(v,x)nI'(v)|x) 

(v,x)€V"^xA'^ 

= £«+ X! Pynxn(v,x) ^ W”(y|x), (3.8.16) 

(v,x)GV"^ X yG^(v,x)nD(v) 

where the fourth equality follows since the error probability can be written 
as 

£„= ^ Py.;,.(v,x)W"(P^(v)|x) 

(v,x)€V"xA’" 



for a stochastic encoder We notice here that (3.8.14) and (3.8.15) imply 
e-«T'Py„(y) 



M"”(y|x) < 



Pyn(v) 



for y G )B(v,x). By substituting this into the right-hand side of (3.8.16) , we 
obtain 



Fv{V^X^Y^ G Lr,} 

<£„ + e-”'^ ■Px"|K”(x|v) ^ -Py-(y) 

(v,x)GV^ X yGi5(v,x)n(D(v) 

<en + e~^'* Pxn|yn(x|v)Pyn(P(v)) 

(v,x)GV’^^xA’"^ 

= £„ + e-"^ ^ Py^.(P(v)) 

vGV^'- 

which completes the proof of this lemma. □ 



Lemma 3.8.2 immediately implies the following theorem: 



Theorem 3.8.2 (Converse theorem). If a sourceY — {V^}^=i is trans- 
missible over a channel W = for a channel input X = 

(X'^ can be arbitrarily correlated with the source output V^) and any sequence 
{7n} satisfying the condition (3.8.11), it holds that 



lim Pr 

n—^oo 




w"(y'^|X”) 






(3.8.17) 



where y” is the channel output corresponding to the input X”. 



3.8 Joint Source- Channel Coding 



255 



Proof. If V is transmissible over W, due to Definition 3.8.1 there exists an 
(n, £n)-code satisfying lim Sn = 0. Let (pn be the (stochastic) encoder of this 

n—^oo 

code and set X'^ = Pn{y^)- Denote by the channel output corresponding 
to X'^. Then, the claim of the theorem immediately follows from (3.8.13) in 
Lemma 3.8.2, using 7 ^ as 7 . □ 



Remark 3.8.3 (Necessary and sufficient condition). The difference be- 
tween (3.8.12) in Theorem 3.8.1 and (3.8.17) in Theorem 3.8.2 only appears 
in the signs of 7 ^. If we neglect this difference of the signs, we may say that 
the combination of Theorem 3.8.1 and Theorem 3.8.2 essentially provides an 
“necessary and sufficient condition” under which a source V = is 

transmissible over a channel W = ^ 



Let us consider here the meanings of (3.8.12) and (3.8.17). First, we con- 
sider (3.8.12). Setting 



1 ^ W^(Y^\X^) 



1 . 1 

Bn = - log • 



n "’Pvn(r«) 
for simplicity, (3.8.12) can be written as 

an = Pr {An < Bn-\- 7n} 0 as n ^ oo. 

By expressing this as 

P^ {An ^ Bn T Tn} 

= ^ Pr {Bn = u} Pr [An < + 7 „| 5 „ = u} 



(3.8.18) 



Pr [Bn = u} Pr {A„ <u + jn\Bn= u} 



and setting 

Tn = {u\ Pr {An <u-\- ^n\Bn = u} < ^/^} , (3.8.19) 

it follows from (3.8.18) and Markov’s inequality (see Remark 1 . 1.1 in § 1.1 in 
Chapter 1) that 

Pr {Bn eTn}>l- (3.8.20) 

If we define the upper cumulative probabilities of An and Bn defined by 
Pn{t) = Pr {An > t} , Qn{t) = Pr {Bn > t} 
for any real number t, we have 



256 



3 Channel Coding 



Pn{t) = Pr {Pn = u} Pr {An > t\Bn = u} 



> ^ Pr{Bn = u}PT{An>t\Bn = u} 

-u.eTn-. 

U>t — Jn 

> ^ Pi {Bn = u} Pi {An > u + jn\Bn = u} . (3.8.21) 

■ueTn: 

U>t — 7ri 

Since (3.8.19) guarantees that 

Pr ^ U “h ^n\^n — ^1 \/ 

for u G Tn, (3.8.20) and (3.8.21) yield 

Pn{t) > (1 - V^) Y1 

-u-eTr,.: 

U>t — 'yri. 

> (1 - y/^){Qn{t - 7n) ~ Pr {^n ^ T^}) 

^ (1 \/ ^n)(Qn(^ Tn) \/ ^n) 

^ Qn{t Tn) 2-y/o^. 

That is, if we neglect a difference up to 2 y/^ 0 as n — > (X), Pn{i), the upper 

cumulative probability of must be greater than or equal to Qn(t— Tn), the 
upper cumulative probability of Bn, for any t (see Fig. 3.4). This means that 
the mutual information spectrum of the channel lies to the right compared 
with the entropy-spectrum of the source. If we replace 7 ^ with — 7 ^ and 
repeat the argument above, the same implication also follows from (3.8.17). 



entropy spectrum mutual information spectrum 




Fig. 3.4. 



We have seen the meaning of the “necessary and sufficient condition” 
(3.8.12) and (3.8.17) in terms of the information-spectrum. This kind of re- 
lationship of location of the two information-spectra enables us to encode a 
source output directly into a channel output in a single stage. 

However, we can actually obtain two stronger results on the relationship 
on location of the information-spectra from (3.8.12) and (3.8.17), which are 
equivalent to (3.8.12) and (3.8.17), respectively, by choosing the input vari- 
able X'^ and the output variable again. Such results described in the 
following two theorems are useful for characterizing a relationship between 



3.8 Joint Source- Channel Coding 



257 



the “separation principle” and the information-spectra. First, we have the fol- 
lowing theorem giving an expression for the two information-spectra that is 
equivalent to the sufficient condition in Theorem 3.8.1 (direct theorem). 



Theorem 3.8.3 (Equivalence of the sufficient conditions). The fol- 
lowing two conditions are equivalent: 



l)For a channel input X = (X^ can he arbitrarily correlated with 

a source output ) and a sequence {7^} satisfying (3.8.11), it holds that 



lim Pr 

n-^00 




PyAyv 






+ 7n 



= 0 , 



(3.8.22) 



where y" denotes the channel output corresponding to the input X”. 



(Strict domination: Vembu, Verdu and Steinberg [89]) For a sequence 
{7„} satisfying (3.8.11), there exist a channel input X = and a 

sequence {cn} such that 



lim ( Pr 

n-^oo 






log 



Pv^iV^ 



> Cq 



} 



+ Pr 






0 , 



Py4y^ 

where Y'^ denotes the channel output corresponding to the input X'^. 



(3.8.23) 



Remark 3.8.4. The equivalence between conditions 1) and 2) in Theo- 
rem 3.8.3 is still valid if a “sequence {7n}” is replaced with a “constant 
7 > 0” that is independent of n. □ 



Remark 3.8.5. Condition 2) in Theorem 3.8.3 means that in the asymptotic 
sense the entropy-spectrum of a source is completely separated from the 
mutual information spectrum of a channel by a gap of width 7^ and the 
former lies to the left of the latter (however, condition 2) does not exclude the 
possibility that the two separated information-spectra synchronously vibrate 
as n increases). If condition 2) holds, we can first encode the source output V'^ 

by a fixed-length coding of rate — — log Mn and then encode the output 

n 

into a channel input by treating the output as an element of a message set 
(two-stage coding). The error probability Sn of this two-stage coding is upper 
bounded by the sum of 

the average error probability of the source coding (see Lemma 1.3.1 in §1.3 
in Chapter 1), and 



Pr 



-log 

n 



W^{Y^\X^) 

Pyr^iYn) 









258 3 Channel Coding 



the maximum error probability of the channel coding (see Lemma 3.4.1 and 
Remark 3.4.2 in §3.4). Equation (3.8.23) guarantees that these error prob- 
abilities goes to zero as n goes to infinity, which implies lim Sn = 0 (note 

n— >oo 

that > 0). This means that the source V = is transmissible 

over the channel W = This argument gives an alternative proof 

for Theorem 3.8.1. □ 



Proof of Theorem 3.8.3. 

2) 1): For any joint probability distribution Py^x^ of and we 

have 



^ , 1 W^iY^\X^) 1 



+ In 



<Pr|i 

[n 



log 



1 



PvAV^) 



^ Cyi 



+ Pr 



(s 



W^cv^ix^) 

< Cn + 7n 



Pyr^{Y-) 

In view of (3.8.23) this means (3.8.22). 
1) 2): Assume that 1) holds and set 



CTrj 



Pr ^ 1 log 






n 



< - log ■ 



Pyr,.(y«) -n '’Pyn(y«) 



+ 7n 



}■ 



Set 7^ = -^ and = max(y'a^, e Define 



dn = sup 






log 






>R 






Furthermore, define 
Sn = |ve V 



~ log r, I 



n ~ Pyr,. (v) 

XW = Pr {yn e Sn} , Ai^) = Pr {P" ^ 5„} . 



(3.8.24) 

(3.8.25) 

(3.8.26) 

(3.8.27) 



Then, the joint probability distribution Py^x^ Y’^ can be written in the form 
of a mixed distribution as follows: 



Py„X"y"(v,x,y) 

= Ai^lPp„j^„,y„(v,x,y) + A^^lPp"Y"y"(’^>x-y)- (3.8.28) 

where P^r,. j^r,.yrr and Pyn-^nyn. denote the conditional probability distribu- 
tions of Y'f^X'^Y'^ given G Sn and ^ Sn^ respectively. We note here 
that, due to the Markov chain V'^ we have Pyri^x^' ~ = 

and 



3.8 Joint Source- Channel Coding 



259 



and V X Y 

form Markov chains. Now, let us express (3.8.24) as 



Giy 



(l. W^"(y”|X”) 1, 1 

= Pr \ - log — < - log 



n 



Py„(y«) 



n 



Py.(y") 



+ 7n 



+ Pr < - log 

In 



iy"(F"|A:") ^ 1 , 1 

< - log =jj- 

Py„(y") -n ^Py„(F”) 

^(1) 



+ In 



(3.8.29) 



Since (3.8.25) and (3.8.26) yield Xn > Sn > it follows from (3.8.29) 

that 



Pr« 



1 W^CY^IX^) 1 , 1 

— log = < — log = — 

n ^ Pyrr(y^) ~n ^ Pyr^{V^) 



4“ 7n p ^ y / ^71 



Then, (3.8.30) implies 
Pr 



1 W^CY^IX^) , ^ 

~ log ~ . ~ ^ <^n + 7n P ^ 

n Pyr,,(y^) J 



(3.8.30) 



(3.8.31) 



because 



1 



— log ■ 

n ^ Pyr.{V^ 






owing to the definition of V'^. Notice here that (3.8.28) implies that 
PyAy) = AWPy„(y) + Al2)Py„(y) 

> Aii)Py„(y) 

> SnPy^iy) 

> e-^<Py„iy). 

Thus, we have 



1 



log- 



1 1 . 

< - log ■ 



1 



+ 7n- 



n ^Pyr,\Y^)-n ° 

By substituting this into (3.8.31), we obtain 

Py.(y-) S^, + 7„-7 „|<v5^. 



On the other hand, it holds that 
1 



Pr 






log- 



!> dn + 2-7, 






(3.8.32) 



(3.8.33) 



n Pyri. (y^) 

due to the definition dn in (3.8.25). By setting = dn + 27 ^ and noting 
— > 0, 0 as n 00 and 7 ^ = ^5 (3.8.32) and (3.8.33) yield 



260 3 Channel Coding 



lim I Pr 

n—^oo 






log 



PvAv^) 



!> C« 



„ 1 , 1 

+ Pr - log ~ < Cn + -7n 

[n Py,,{Y^) 4 



1 



- 0 . 



Finally, defining as X'^Y'^ and - 7 ^ as 7 ^ establishes condition 2 ) in 

(3.8.23). □ 



The following theorem gives an expression for the two information-spectra 
that is equivalent to the necessary condition in Theorem 3.8.2 (converse the- 
orem) . 



Theorem 3.8.4 (Equivalence of the necessary conditions). The fol- 
lowing two conditions are equivalent: 



l)For any sequence {7n} satisfying (3.8.11), there exists a channel input 
X = can be arbitrarily correlated with a source output V'^) 

such that 



lim Pr 

n—^00 




W^{Y^\X^) 

Pyr^iY-) 



< - log 
n 



1 

Pyr^iV^) 




(3.8.34) 



where Y'^ denotes the channel output corresponding to the input X'^. 



2) (Domination) For any sequence { 7 ^} satisfying (3.8.11), there exist a 
channel input X = {X'^}^^i and a sequence {cn} such that 



lim I Pr 

n — >•00 






+ Pr 



log 



1 






PvAvV 

log r, < Cn - 7n 



= 0 



PyAyV 

where Y'^ denotes the channel output corresponding to the input 



(3.8.35) 



Proof. This theorem can be proved similarly to Theorem 3.8.3 by replacing 

7n with -7n. □ 



Remark 3.8.6. The equivalence between conditions 1) and 2) in Theo- 
rem 3.8.4 is still valid if a “sequence { 7 n}” is replaced with a “constant 
7 > 0 ” that is independent of n. □ 



Remark 3.8.7. Condition 2 ) in Theorem 3.8.4 means that, if we neglect 
overlap of width at most 7 ^, in an asymptotic sense the entropy-spectrum 
of a source is completely separated from the mutual information spectrum 
of a channel and the former lies to the left of the latter (Fig. 3.5). Clearly, 
the necessary condition 2) in Theorem 3.8.4 (domination) is weaker than the 



3.8 Joint Source- Channel Coding 



261 



sufficient condition in Theorem 3.8.3 (strict domination). However, the gap 
between these two becomes crucial only in a “singular” case where the two 
information-spectra concentrate in the neighborhood of a boundary with 
width at most 27 ^. Hence, unless such a singular case occurs, the “separation 
principle” given at the beginning of this chapter holds for a general source V 
and a general channel W asymptotically for each n by virtue of Remark 3.8.5, 
Theorem 3.8.2 and Theorem 3.8.4. □ 



Remark 3.8.8. Actually, Vembu, Verdii and Steinberg [89] defined the dom- 
ination differently from condition 2) in Theorem 3.8.4 in the following way: 

2') (Domination) For any sequence {(2^} and any sequence { 7 ^} satisfying 
condition (3.8.11), there exists a channel input X = satisfying 



where denotes the channel output corresponding to the input X'^. 

We can obtain condition 2') as a consequence of the necessary condition 
2) in Theorem 3.8.4. To see this, set 




entropy spectrum 



mutual information spectrum 



Fig. 3.5. 




(3.8.36) 




(3.8.37) 



(3.8.38) 



(3.8.39) 



(3.8.40) 



262 3 Channel Coding 



Then, we observe that if dn > Cn] and fin < Pn if dn < Cn, and 

hence it follows from condition 2) that x Mn < <^n + /?n — ^ 0 as n tends to 
oo. Therefore, condition 2) implies condition 2'), which means that condition 
2) is stronger than or equivalent to 2') as necessary conditions. 

On the other hand, it is currently not certain whether condition 2') im- 
plies condition 2) or not. □ 



Remark 3.8.9. So far we have only treated the case that the error proba- 
bility satisfies lim Sn = 0. We can also treat the weaker case such that the 

71—^00 

error probability satisfies 

lim sup £n < ^ (3.8.41) 

n— )-oo 

for an arbitrary constant e with 0 < £ < 1 (clearly, setting £ = 0 leads to the 
coding we have treated so far). We define a source V being £-transmissible 
over a channel W if there exists an (n,£^)-code satisfying (3.8.41). Then, 
we have the following theorems corresponding to Theorem 3.8.1 and The- 
orem 3.8.2, which can be easily established by following the same lines as 
above. 

Theorem 3.8.5 (£-Direct theorem). Let V == be a source and 

W = CL channel If for a channel input X = (X'^ can be 

arbitrarily correlated with the source output V^) and a sequence {7n} satis- 
fying (3.8.11) it holds that 

{ n Py.(Y-) - n + ''» } S 

then the source V — is e -transmissible over the channel W = 

where denotes the channel output corresponding to the input 

X^. □ 



Theorem 3.8.6 (£-Converse theorem). If a source V = 
e -transmissible over a channel W = for a channel input X = 

(X'^ can be arbitrarily correlated with the source output V^) and 
any sequence {7n} satisfying (3.8.11) it holds that 



lim sup Pr 

n— >oo 




PyAY'^) 






■ln> <S, 



(3.8.43) 



where Y'^ denotes the channel output corresponding to the input X'^. □ 



Note that it is difficult to express a sufficient condition and a necessary 
condition on £-achievability in generalized forms of the strict domination 
(3.8.23) and the domination (3.8.35). □ 



3.9 Separation Theorems of the Traditional Type 

3.9 Separation Theorems of the Traditional Type 



263 



Thus far in this chapter we have investigated the joint source-channel cod- 
ing problem from the viewpoint of information spectra and established the 
C theorems (Theorems 3. 8. 1-3. 8. 4). The forms of these results are of seem- 
ingly different forms from separation theorem of the traditional type. Then, it 
would be natural to ask a question how the separation principle of the infor- 
mation spectrum type is related to the separation theorem of the traditional 
type. In this section we address this question. 

Let us first record a typical separation theorem of the traditional type. A 
general source V = is said to be information- stable (cf. Dobrushin 

[23], Pinsker [76]) if 



i log Pyr^^) 
Hn{V^) 



1 in prob.. 



(3.9.1) 



where Hn{V^) — and H{V^) stands for the entropy of (cf. Cover 

and Thomas [17]). Moreover, a general channel W = is said to be 

information- stable (cf. Dobrushin [23], Hu [51]) if there exists a channel input 
X = such that 









Cn{W») 



1 in prob., 



(3.9.2) 



where 

C„(W”)=sup-J(X";y"), 
X- n 



and is the channel output via due to the channel input and 
is the mutual information between X'^ and (cf. Cover and 
Thomas [17]). Then, we can summarize a typical separation theorem of the 
traditional type as follows. 



Theorem 3.9.1 (Dobrushin [23], Pinsker [76]). Let the channelW = 
be information- stable and suppose that the limit lim Cn{W'^) exists, or, let 

n—^oo 

the source V = {V^}^=i be information- stable and suppose that the limit 
lim Hn{V^) exists. Then, the following two statements hold: 

n— J-oo 

1) If Rf(V) < C(W), then the source V is transmissible over the channel 
W. In this case, we can separate the source coding and the channel coding. 

2) If the source V is transmissible over the channel W, then it must hold 

that Rf(V) < C(W). □ 



In order to generalize Theorem 3.9.1, we need to introduce the concept of 
optimistic coding as follows (cf. Definition 1.1.1, Definition 1.1.2, Definition 



264 3 Channel Coding 



3.1.1 and Definition 3.1.2): The “optimistic” standpoint means thatwe eval- 
uate the coding reliability with lim inf^-^oo — 0 (that is, Sn < Vs (> 0): 
for infinitely many n). In contrast with this, the standpoint that we have 
taken so far is called pessimistic^ which means that we evaluate the coding 
reliability with limn^oo — 0 (that is, Sn < Vs (> 0): for all sufficiently 
large n). 

The first one concerns the optimistic source coding with any general 
source V. 

Definition 3.9.1 (Optimistic achievability ) . 

Rate R is optimistically achievable There exists an (n, M^, Sn)-code 

satisfying liminfs^ = 0 and 

n—^oo 

lim sup — log Mn < R. 

n-^oo 

Definition 3.9.2 (Optimistic infimum achievable fixed- length cod- 
ing rate). 

Rj:(V) = inf {R I R is optimistically achievable} . 



Then, for any general source V = {y^}^=i we have: 
Theorem 3.9.2 (Chen and Alajaji [15]). 



Rf{V) = inf 




lim inf Pr < — log 
n— >oo I n 



1 

Pv^y") 




(3.9.3) 



Proof It suffices only to parallel the proof of Theorem 1.3.1 with due modi- 
fications. □ 



The second one concerns the optimistic channel coding with any general 
channel W. 

Definition 3.9.3 (Optimistic achievability). 

Rate R is optimistically achievable There exists an (n, M^, £^)-code 

satisfying lim infe^ = 0 and 

n-^oo 

lim inf - log Mn > R. 

n-^oo Ji 

Definition 3.9.4 (Optimistic channel capacity). 

C(W) = sup {R I R is optimistically achievable} . 



Then, with any general channel W = we have 



3.9 Separation Theorems of the Traditional Type 



265 



Theorem 3.9.3 (Vembu, Verdii and Steinberg [89]; Chen and Alajaji [15]). 



C(W) 

, 3 .., 

where Y'^ is the output due to the input X = 

Proof. It suffices only to parallel the proof of Theorem 3.2.1 with due modi- 
fications. □ 



Remark 3.9.1. It is easy to check that, in parallel with Theorem 3.9.2 and 
Theorem 3.9.3, Theorem 1.3.1 and Theorem 3.2.1 can be written as 



RfCV) == inf lim Pr | — 

[ n->oo n 

CCW) = sup sup I R lim Pr < — 

X [ n->oo [ n 



>R 



Pvn(yn) 

W^(Y^\X^) ^ 






(3.9.5) 

(3.9.6) 



from which, together with Theorem 3.9.2 and Theorem 3.9.3, it immediately 
follows that 



C(W) < C(W), (3.9.7) 

P^(X) < P/(X). (3.9.8) 



Now, we have: 

Theorem 3.9.4. Let W == general channel and V = 

be any general source. Then, the following two statements hold: 

1) If Rf{y) < C(W), then the source V is transmissible over the channel 
W. In this case, we can separate the source coding and the channel coding. 

2) If the source V is transmissible over the channel W, then it must hold 
that 

Rf(V) < C(W), (3.9.9) 

Rf(V) <C(W). (3.9.10) 

Remark 3.9.2. Inequality (3.9.10) was given by Vembu, Verdii and Stein- 
berg [89]. □ 



Proof of Theorem 3.9.4- 

1); Since Rf(V) = P^(V), C(W) = sup/(X; Y) by Theorem 1.3.1 and Theo- 

X 

rem 3.2.1, the inequality Rf(V) < C(W) implies that condition 2) in Theo- 
rem 3.8.4 holds for X = attaining the supremum sup/(X; Y) with, 

X 



266 3 Channel Coding 



for example, Cn — -{Rf(V) + C(W)). Therefore, the source V is transmis- 
sible over the channel W. 



2): If the source V is transmissible over the channel W, then condition 2) in 
Theorem 3.8.4 holds, i.e., 

lim Prjl-log— — > c„l = 0, (3.9.11) 

n— ^oo [n Pyn(y^) J 

I n Py.{Yn) ^ - T'"} = 0- (3.9.12) 



Since lim^_,oo 7n = 0? these two conditions with any small constant (5 > 0 
lead us to the following formulas: 

i 



lim inf Pr < — log 



lim Pr < — log 



lim inf Pr < — log 
n 





(3.9.13) 




(3.9.14) 




(3.9.15) 




(3.9.16) 



Then, Theorem 3.9.2 and (3.9.13) implies that Rf(V) < liminf^-i-oo Cn, 
whereas (3.9.15) implies that /(X;Y) > liminf^-^oo Cn, Therefore, by The- 
orem 3.2.1 we have Rf(V) < lim inf^^oo < /(X;Y) < supxZ(X;Y) = 
C(W). On the other hand, (3.9.14) implies that H{X) < limsup^^^Q c^. 
Furthermore, (3.9.16) together with Theorem 3.9.3 gives us 

H(X) < lim sup Cn < C(W). 



Finally, note that Rf(X.) = H{X.) by Theorem 1.3.1. 



□ 



We are now interested in the problem of what conditions are needed to at- 
tain the equalities Rf(X.) = Rf{X) and/or C(W) = C(W) in Theorem 3.9.4. 
To see this, we need the following two definitions: 

Definition 3.9.5. A general source V = is said to satisfy the semi- 
strong converse property if for all divergent subsequences of positive 

integers such that ni < ri2 < • • ■ — ^ oo it holds that 

^limsuplloe p,^__|^,-) =H(V). (3.9.17) 



3.9 Separation Theorems of the Traditional Type 



267 



Definition 3.9.6. A general channel W == is said to satisfy the 

semi- strong converse property if for any channel input X = and for 

all divergent subsequences of positive integers such that rii < U 2 < 

• • • — > oo it holds that 

p-liminf — log— <sup/(X;Y), (3.9.18) 

^-^00 m FYrH{Y^^) X 

where is the channel output via due to the channel input X^. □ 



With these definitions, we have: 

Lemma 3.9.1. 

1) A general source V satisfies the semi-strong converse property if and only 

if 

Rf{Y) = Rf(V). (3.9.19) 

2) A general channel W satisfies the semi-strong converse property if and 

only if 

C(W) = C{W). (3.9.20) 

Proof. It is obvious in view of Theorem 3.9.2, Theorem 3.9.3 and Remark 
3.9.1. □ 



Remark 3.9.3. Originally, Csiszar and Korner [19] posed two operational 
standpoints in source coding and channel coding, i.e., the pessimistic stand- 
point and the optimistic standpoint In their terminology. Lemma 3.9.1 states 
that, for source coding, the semi-strong convserse property is equivalent to 
the condition that both the pessimistic standpoint and the optiimistic stand- 
point result in the same infimum of all achievable fixed-length source coding 
rates; similarly, for channel coding, the semi-strong convserse property is 
equivalent to the condition that both the pessimistic standpoint and the op- 
timistic standpoint result in the same supremum of all achievable channel 
coding rates. □ 



Thus, Theorem 3.9.4 together with Lemma 3.9.1 immediately yields the 
following stronger separation theorem of the traditional type: 

Theorem 3.9.5. Let either the source V = or the channel W = 

satisfythe semi-strong converse property. Then, the following two 
statements hold: 

1) If Rf(V) < C(W), then the source V is transmissible over the channel 
W. In this case, we can separate the source coding and the channel coding. 



268 3 Channel Coding 



2 ) 



If the source V is transmissible over the channel W, then it must hold 
thatRfiV) < C(W). □ 



Remark 3.9.4. Vembu, Verdu and Steinberg [89] first demonstrated an 
equivalent of Theorem 3.9.5 in terms of another kind of operational notions 
instead of the semi-strong converse property here (cf. Lemma 3.9.1). 

From the above arguments it is easy to check the validity of the following 
statements: 

1) The information-stability of a source V (resp. channel W) with the limit 
implies the strong converse property of V (resp. W). 

2) The strong converse property of a source V (resp. channel W) implies the 
semi-strong converse property of V (resp. W). 

Therefore, Theorem 3.9.1 is derived as a special case from Theorem 3.9.5. □ 



Example 3.9.1. Let us consider two different stationary memoryless sources 
Vi = ^2 = with countably infinite source alphabet V, 

and define its mixed source V = by 

Py,, (v) = aiPv^ (v) + a 2 Pv^ (v) (v G V”) , 

where ai, 0:2 are positive constants such that ai + 0:2 = 1. Then, this mixed 
source V — satisfies the semi-strong converse property but neither 

the strong converse property nor the information-stability. 

Similarly, let us consider two different stationary memoryless channels 
Wi == ^2 {^ 2 '}’n=i with arbitrary abstract input and output 

alphabets A:',T, and define its mixed channel W = by 

W-(ylx) = aiW^{y\x)+a 2 W^{y\^) (x G G y^). 



Then, this mixed channel W = satisfies the semi-strong con- 

verse property but neither the strong converse property nor the information- 
stability. □ 



4 Hypothesis Testing 



4.1 Hypothesis Testing 

For two given general sources X = and X = we consider 

the hypothesis testing problem with the null hypothesis X and the alternative 
hypothesis X. This problem is also called the hypothesis testing X against X 
for simplicity. Here, both X'^ and X are supposed to be -valued random 
variables, where X denotes a source alphabet. In ordinary hypothesis testing 
problems we choose a subset An C as an acceptance region. If x, an 
output from one of the two sources, belongs to An, then we judge that the null 
hypothesis X = is true. Otherwise, we judge that the alternative 

hypothesis X — is true. We define the error probability of the first 

kind and the error probability of the second kind by 

fln=^r{X^ ^An}, 
and 

An = Pr{X" G An}, 

respectively. From the definitions above, /i^ is the probability that we mis- 
judge the alternative hypothesis X true when the null hypothesis X is actually 
true and the probability that we misjudge X true when X is actually true. 
The complement Cn = — An is called the critical region of the hypothesis 

testing. Throughout this chapter suppose that alphabet X is arbitrary (X 
can be countably infinite or abstract) unless stated otherwise. 

The hypothesis testing is formulated as the problem of choosing an ac- 
ceptance region An that makes the error probability of the second kind A^ as 
small as possible subject to the constraint that the error probability of the 
first kind fin is upper bounded by a given constant. The following two simple, 
but powerful, lemmas are useful in order to obtain fundamental results on 
the hypothesis testing: 

Lemma 4.1.1. Define 

A = (xGA’”|ilog^^>d 

I n Px-{^) J 



270 4 Hypothesis Testing 



for an arbitrary real number * t. Then, it holds that Fr{X e An} < e 
Proof. Since it follows that 

l>Pr{X”€ A}= 5] Px-(x) 

X€v4ri 

> Y, PirWe“ 

- FviJr e An}e^\ 

we have Pr{X^ ^ An} < □ 

Lemma 4.1.2. For any real number t and An C it holds that 

Pr {X” ^ An} + Pr{X" e A} > Pr log < *} • 

Remark 4.1.1. Lemma 4.1.2 can also be established as a consequence of 
Neyman-Pearson lemma [74]. However, there is no essential difference be- 
tween the claims of Lemma 4.1.2 and Neyman-Pearson lemma if we consider 
the asymptotic situation with n being sufficiently large. □ 



Proof of Lemma 4 .. 1.2. 
Set 



|x G X" I 1 log < t 



P^n(x) 



Then, it follows that 






n 



P-xAx-) 



h 



Pr{X"€5„} 



= Pr{X" € 5„n.4^}+Pr{X” € n A} 

< Pr {X” ^ A} + Pr {X" G n .4„} . 

By noticing that x G 5„ implies Px"(x) < P^{x)e^*', we have 
Pr{X”G5„n.4„}= 



In the case where the source alphabet X is abstract in general, it is understood 
that 5 'n(x) = ^ denotes the Radon-Nikodym derivative between 

probability measures on X^ with values on a singular set assumed conventionally 
to be + 00 . Then, is defined to be gn(X^), which is obviously a random 

variable. 



4.1 Hypothesis Testing 271 



xeAt 

= e"^Pr{X"G A}, 

which completes the proof of the lemma. □ 

Now, we give definitions required for formulation of the hypothesis testing 
with the null hypothesis X == and the alternative hypothesis X = 

{X }^i- In order to define the hypothesis testing problems we need to fix 
a constraint that /i^, the error probability of the first kind, must satisfy. We 
first consider the constraint that fin satisfies /i^ ^ 0 as n — > oo. Since subject 
to this constraint the error probability of the second kind An can usually be 
written as 

{R>0), 

i.e.. An goes to zero of exponential order of block length n, it is fundamental 
to consider how we can make the exponent R large. The following definitions 
formulate such a situation. 

Definition 4.1.1. 

dcf 

Rate R is achievable There exists an acceptance region An satisfying 

lim /in = 0 and lim inf — log ^ > R. 

n— >-oo n-^oo n An 

Definition 4.1.2 (Supremum achievable error probability exponent). 



R(X||X) = sup {R I R is achievable} . 



1 PYn,(X^] 

In the hypothesis testing problems the random variable — log - ^ ^ 



n 



plays a crucial role. We call this the divergence density rate or the likelihood- 
ratio density rate and its probability distribution the divergence- spectrum (or, 
more generally, the inf ormation- spectrum) . 

Here, we define: 



Definition 4.1.3. 

^(X| |X) = p- lim inf - log 

n-»-oo n 

and call D(X||X) the spectral inf- divergence rate of X with respect to X. 

Then, the spectral inf- divergence rate turns out to be nonnegative from 
its definition and Lemma 3.2.1, using and instead of Un and Vn, 
respectively. That is, it holds that 

D(X||X) >0. 

We have the following fundamental theorem on 5(X||X). 



272 4 Hypothesis Testing 

Theorem 4.1.1 (Verdu [90]). 

B(X||X) = D(X||X). 

Proof. 

1 ) Direct part: 

Define R = D(X||X) — 7 for an arbitrary 7 > 0 and consider the hypoth- 
esis testing with the acceptance region 



Then, the definition of D(X||X) tells us that 
fin = Pr ^ An) -^0 as n ^ 00 . 

On the other hand, Lemma 4.1.1 with t = R implies that 
Pr{X" e An} < 

i.e., An < Accordingly, we obtain 

lim inf — log > R. 

n->oo n An 

This establishes that R = D(X||X) — 7 is achievable for any 7 > 0, which 
means 5(X||X) >D(X||X). 

2 ) Converse part: 

Suppose that R is achievable. Then, there exists an acceptance region 
An satisfying lim fin = 0 and lim inf — log — > R. Hence, for any 7 > 0 it 

n— >00 n— >00 n An 

follows that 

- log ^ > R-^ (Vn > no), 
n An 

which leads to 




An ^ e (Vn > no). 

On the other hand. Lemma 4.1.2 with t = R — 2^ implies that 




Therefore, for all n > no we have 




Since lim {fin + e ^^) = 0, it follows that 




4.1 Hypothesis Testing 273 



Thus, we obtain 

D(x||x) >77-27. 

Since 7 > 0 is arbitrary, D(X||X) > R follows. Hence, H(X||X) < D(X||X) 
is established. □ 



Here, as an application of Theorem 4.1.1, we consider the case that X 
and X are the stationary memoryless sources subject to probability distribu- 
tions Px and Py, respectively. By Khintchin’s theorem (Theorem 1.3.2) the 

1 , PxAx^) 



divergence-spectrum of — log 

n P- 



with a peak of probability one at D{X\\X) as n 



converges to the one-point spectrum 
00. This fact implies 



D{X\\X) = D{X\\X), 

where D{X\\X) denotes the divergence between X and X. As a consequence, 
we obtain the following well-known result (see also Theorem 4.3.2 below): 



Corollary 4.1.1. 

B(X||X)=I)(X||X). (4.1.1) 



The combination of this corollary with Corollary 4.2.1 in the following section 
is called Stein’s lemma. 

Example 4.1.1 (Hypothesis testing for the mixed source). Suppose 
that the null hypothesis X = is the mixed source with probability 

distribution 



Px- (x) == oiPx- (x) + a2Px^ (x) (oi > 0 , 02 > 0 , oi + 02 = 1 ) 

and the alternative hypothesis X = {X^} is not the mixed source. Setting 
Xi = {Af and X 2 = {-^2 obtain the formula 

P(X||X) - P(X||X) = min(P(Xi||X), P(X 2 ||X)) (4.1.2) 



(this formula can be verified by using the argument given in the proofs of 
Lemma 1.4.1 and Lemma 3.3.1). In particular, if Xi, X 2 and X are the sta- 
tionary memoryless sources subject to Px^ Px 2 respectively, then 

, 1 , Pxr^X^) 

the divergence-spectrum 01 — log — — — converges to the two-point spec- 

n Px^TX^) _ _ 

trum with two peaks of probabilities oi and 02 at P(Xi||X) and P(X 2 ||X), 
respectively, as n 00. Therefore, P(X||X) is given by 



P(X|1X) -min(P(Xil|X),P(X 2 l|X)) 



(4.1.3) 



(see also Remark 4.4.3). 



□ 



274 4 Hypothesis Testing 



Example 4.1.2 (Hypothesis testing for a nonstationary memory- 
less source). Let us consider the case that both X = and 

X == {X are memory less sources without stationarity under the as- 

sumption that A' is a finite source alphabet. Letting X“^ = (Xi, X 2 , • • • , X^) 
and X =(Xi,X 2 ,---, X^) be the two memoryless sources, Theorem 4.1.1 
and Chebyshev’s inequality yield the formula 

— — 1 _ 

B(X||X) -D(X||X) -liminf-V D(X,||X,). 

n— >00 n 

i=l 

For example, if 

p ^[Pxx {i is odd), 

\Px 2 (Hs even), 



p_ ^ ( Px^ (* is odd), 

\Px 2 (* is even), 

then it is easy to see that 

B(X||X)=0X||X) 

= ^D{X,\\X,) + ^D{X2\\X2). 

In addition, for the set J defined by (3.2.23) in Remark 3.2.3 in §3.2, if 



Px. = 

Px,= 



{ 



Pxi 


for 2 G J, 


Px2 


for i ^ J, 


Px. 


for 2 G J, 


^X2 


for 2 ^ J, 



then we have 



B(X||X) = min (Ai?(Xi||Xi) + (l-A)Z?(X 2 ||X 2 )) 

3<^<f 

= ^mm{D{X^\\X^),D{X2\\X2)) 

+ imax(D(Xi||Xi),Z?(X2||X2)). 

We can generalize the mixed source considered in Example 4.1.1 in the 
following way. For arbitrarily given infinitely many general sources X^ == 
{Xf^l^i (i = 1, 2, • • •), we call the source X = {X'^}^^i defined by 

CO 

Px’^{x) = '^aiPxi‘-{x) (Vn = l,2,---;Vxe A:’”) (4.1.4) 

i=l 

the mixed source of the source family {X.j}^]^, where {i = 1,2, • • •) are 
constants satisfying 



4.1 Hypothesis Testing 275 



oo 

^a» = l (ai>0; Vi = l,2,---). 

i=l 

We have the following lemma characterizing the spectral inf-divergence rate 
of such a mixed source X with respect to an arbitrarily given general source 
x = {X"}~i. 

Lemma 4.1.3. For the mixed source X defined in (4-1.4)} 

^(X||X) = inf £(Xi||X). (4.1.5) 



Proof. We have only to calculate the information-spectrum similarly to the 
proofs of Lemma 1.4.3 (§1.4) and Lemma 3.3.2 (§3.3). □ 

Theorem 4.1.1 and Lemma 4.1.3 immediately yield the following theorem. 
Theorem 4.1.2. For the mixed source defined in (4-1-4)} 

B(X||X) = inf ^^(X,||X). (4.1.6) 

If we consider a special case that all of X = {^}^=i and X^ = 

{i — 1,2, •••) are stationary memoryless sources, we obtain the following 
corollary from Theorem 4.1.2. 

Corollary 4.1.2. Let X he an arbitrary (not necessarily countable) source 
alphabet. // X == {X and X^ = I'he stationary memo- 
ryless sources subject to probability distributions and Pxi (i = 1,2, • • •), 

respectively, then 

B(X|1X)= inf il(X,||X) (4.1.7) 

2>l:o;i>0 

for the mixed source X = defined in (4-1-4)} where D{Xi\\X) de- 

notes the divergence. 

Next, let us consider a mixed source with a more general way of mixing 
(see §1.4 in Chapter 1). Let ^ be an arbitrary set (probability space) and 
assign a general source X^ = {^e}?^=i 0 e Here, we assume 

that, denoting a source alphabet by X, Pxj^{A), the probability of A, is 
a measurable function of ^ for all n = 1, 2, • • • and for all measurable sets 
A C . If we fix an arbitrary probability measure w on we have a source 
X = subject to the probability distribution 

PxAA)= [ Pxj;{A)dw{0) (Vn-1,2,...). (4.1.8) 

J4> 

This source is called the mixed source of the source family {Xq}q^^. Fur- 
thermore, letting X = {X be another general source, we define the 

following two functions of R instead of the spectral inf-divergence rate: 



276 4 Hypothesis Testing 



A-(JJIXIIX) s liminf Pr {i log < fl} , (4,1.9) 

F(i?|X||X) = limsupPr ( - log < r] , (4.1.10) 

each of which is determined from the divergence-spectrum itself. We attempt 
to characterize these two functions by using w{-). However, since such char- 
acterization is difficult for general sources {9 e and X, we assume 
that A' is a finite source alphabet and X and X^i {9 G are the stationary 
memoryless sources subject to probability distributions and Pxq G ^), 
respectively (we use the notations X = {X} and X^ = {^e} foi" simplicity). 
Then, we have the following lemma. This lemma corresponds to Lemma 1.4.4 
in §1.4 and Lemma 3.3.3 in §3.3. 

Lemma 4.1.4. Let X be a finite source alphabet If each Xq = {Xe} is 
stationary and memory less and so is X have 

[ _ dw{9) < K{R\X\\X) 

J{e\D{Xo\\X)<R} 

< K(i^|X||X) < [ _ dw{9) {\/R > 0) 

J{e\D(X0\\X)<R} 

(4.1.11) 

for the mixed source X defined in (f.l.S), where D{X0 \\X) denotes the di- 
vergence and the inequalities in (4- HI) hold with equality except for at most 
countably infinite R. 

Proof We can prove the lemma by calculating the information-spectrum sim- 
ilarly to the proofs of Lemma 1.4.4 (§1.4) and Lemma 3.3.3 (§3.3). □ 

Theorem 4.1.1 and Lemma 4.1.4 immediately yield the following theorem. 
Theorem 4.1.3. For the mixed source X defined in Lemma 4-^'4^ 

H(X||X) - w-ess.MD{Xe\\X). (4.1.12) 

Proof. We can prove the theorem similarly to the proof of Theorem 1.4.3 
(§1.4) with using Lemma 4.1.4. □ 



Example 4.1.3. Let {Pxo} be a family of probability distributions with 
parameter 9 over a finite source alphabet X. For each 9 denote by Xq the 
stationary memoryless source subject to a probability distribution Pxq • Then, 
X = (xi, ■ • • ,Xn) G X'^ is generated with probability 

n 

Pe{^) = 

i=l 



4.1 Hypothesis Testing 277 



Now, let w{0) be an arbitrary probability measure and denote by X = 
the mixed source obtained by mixing with respect to the prob- 
ability density w{0). Then, the probability distribution of X'^ is given by 

= J P0{x.)dw{6) (Vn = 1,2, • • • ; Vx G A'""). 

Suppose that X = is a stationary memoryless source subject to 

a probability distribution Then, Lemma 4.1.4 guarantees that, in the 
limit of n — > oo, the divergence-spectrum of the mixed source X against 
the source X is distributed along the horizontal axis D{Xe\\X) with the 
probability density w{9) (Fig. 4.1). Then, the divergence rate of (X,X) (see 



Fig. 4.1. 




Remark 4.3.3 in §4.3) is computed as 

£)(X||X)= lim 1d(X”||X")= [ D{X 0 \\X)dw{e). (4.1.13) 

n-*oo n J 

In particular, the divergence-spectrum of Example 4.1.1 becomes the two- 
point spectrum with the two peaks of probabilities a\ and a2 at D{Xi\\X) 
and D{X2\\X), respectively. □ 



We conclude this section by mentioning the hypothesis testing with a com- 
pound source as the null hypothesis which is deeply related to the hypothesis 
testing with a mixed source as the null hypothesis (called the mixed hypothesis 
testing) described above. First, suppose that infinitely many null hypotheses 
X^ = (^ l?^,-*-) and an alternative hypothesis X = 

are given. If we define an acceptance region An C the error probability 
of the first kind is determined by 

/i«=Pr{X”Mn} = (4.1.14) 

for each null hypothesis X^ = {Xf and the error probability of the 
second kind is determined by 



278 4 Hypothesis Testing 

An = Pr{x^ G An}. (4.1.15) 

Here, note that the acceptance region An C above does not depend on the 
suffices i = 1, 2, • • • of the null hypotheses Such a situation 

happens when one of the null hypotheses = 1? 2, • • •) surely 

occurs but a hypothesis tester does not know which one occurs. We call such 
hypothesis testing the compound hypothesis testing {X^}^^ against X. In 
the compound hypothesis testing we want to keep the error probability of 
the first kind small for any null hypothesis that can occur. We attempt to 
make the error probability of the second kind as small as possible under such 
a requirement. We give the following definitions. 

Definition 4.1.4. 

def 

Rate R is achievable There exists an acceptance region An C 

satisfying lim =0 (Vi = 1, 2, • • •) and 

n—^oo 

lim inf — log > R. 
n-^oo n An 

Definition 4.1.5. (Supremum achievable error probability exponent 
in the compound hypothesis testing) 

R({X^}^j^ ||X) = sup {R\ R is achievable} . 



Then, we obtain the following theorem describing a relationship between the 
supremum achievable error probability exponents of the mixed hypothesis 
testing and the compound hypothesis testing. The theorem corresponds to 
Theorem 3.3.5 in Chapter 3 treating channel coding. 

Theorem 4.1.4. Suppose that countably infinite null hypotheses X^ = 

(i = 1,2, • • •) and an alternative hypothesis X = are given. Then, 

the supremum achievable error probability exponent H({X^}^^ ||X) in the 
compound hypothesis testing is equal to 5(X||X) = R({a^,Xi}^^ 11^); ihe 
supremum achievable error probability exponent in the mixed hypothesis test- 
ing with the mixed source X defined by (4.1.4)- That is, we have 

B[{auXi}Zi ||X) = ||X), (4.1.16) 

where we assume that > 0 for all i = 1,2, - - 

Proof Let /in^ {i — 1, 2, • • •) and i^n be the error probability of the first kind 
for the null hypotheses X^ {i = 1,2, • • •) and the mixed null hypothesis X 
with the same acceptance region An. From (4.1.4), we have 

oo 

i=l 



4.2 e-Hypothesis Testing 279 



Then, we can prove this theorem similarly to the proof of Theorem 3.3.5 in 
Chapter 3. □ 

The combination of Corollary 4.1.2 with Theorem 4.1.4 immediately yields 
the following corollary on the compound hypothesis testing. 

Corollary 4.1.3. Let A' he an arbitrary (not necessarily countable) source 
alphabet. IfK = and == = 1,2,---) are stationary 

memoryless sources subject to and Pxi [i = 1,2, •••), respectively, then 

we have 

||X) = inf D{X,\\X) (4.1.17) 

for the compound hypothesis testing against X, where D{Xi\\X) de- 
notes the divergence. □ 



4.2 s-Hypothesis Testing 

In the hypothesis testing described in the preceding section, the error prob- 
ability of the first kind is required to satisfy 

fin ^ Pr ^ An] — > 0 as n — > oo. 

On the other hand, we can consider another requirement that the error prob- 
ability of the first kind satisfies only 

lim sup fin ^ ^ 

n—^oo 

for an arbitrary constant 0 < £ < 1. The exponent of the error probability of 
the second kind is expected to be large under this weakened requirement on 
the error probability of the first kind. This section is devoted to analysis of 
this problem. We first give definitions required for the analysis. 

Definition 4.2.1. 

cicf 

Rate R is ^-achievable There exists an acceptance region An 

satisfying lim sup Pn ^ ^ and 

n— »cx) 

lim inf — log > R. 

n-^oo n An 

Definition 4.2.2 (Supremum e-achievable error probability expo- 
nent). 

R/(^|X1|X) == sup {R I R is £:- achievable} . 



280 4 Hypothesis Testing 



We define a function K{R) by 



K{R) = limsupPr 




(4.2.1) 



(see Fig. 4.2). This function is nothing but i^(i^|X||X) defined in §4.1. Then, 
we obtain the following theorem. 

Theorem 4.2.1 (Chen [14]). 



Remark 4.2.1. The right-hand side of (4.2.2) is a right-continuous and 



Fig. 4.2. 



Proof of Theorem 4-2.1. 

1) Direct part: 

Define Rq = sup {R | K{R) < e}. We prove that for an arbitrary 7 > 0 
R = Rq—"/ is achievable. First, from the definition of Rq — sup {R \ K{R) < s}, 
we have K{R) < e, i.e.. 



Bf{s\X\\X) = sup{R I K{R) <e} {0<\/e< 1). 



(4.2.2) 



monotone increasing function of s. 



□ 




D(X||X) R 




(4.2.3) 



If we consider the hypothesis testing with the acceptance region 




Lemma 4.1.1 with t = R implies 
A„ < e-”^. 



(4.2.4) 



4.2 e-Hypothesis Testing 281 



We notice here that, since the left-hand side of (4.2.3) is equal to limsup 

n-^oo 

we have 

limsup /in ^ 

n— >cxo 

In addition, (4.2.4) guarantees that 
lim inf — log > R. 

n-^oo n An 

Consequently, R = Rq — ^ \s e-achievable. Since 7 > 0 can be arbitrarily 
small, .B/(£|X||X) > Rq is established. 



2) Converse part: 

Suppose that R is ^-achievable. Then, there exists an acceptance region 
An satisfying 

lim sup lin and lim inf — log > R. 

n — ^00 ^ ^ 



The second inequality leads to 
(Vn>no), 



(4.2.5) 



where 7 > 0 is an arbitrary constant. If we set t = 
Lemma 4.1.2, it follows that 



/in H" ^ 



n(iJ-27) X > pr 



j-iogg: 



(X-) 



< R — 2^ 



R — 2 j and apply 



By substituting (4.2.5) into the left-hand side and taking limsup of the both 

n^oo 

hand sides, we obtain 



e > lim sup jin > bm sup Pr 

n— ^•oo n—^00 




Pxr^(X^) 

P-x-i^V 



< R 



-27} 



K{R-2j) <e. (4.2.6) 

Here, define i^o = sup{i^ | K{R) < s} and assume that R > Rq. We can 
choose a sufficiently small 7 > 0 such that R — 2 j > Rq. Then, the defini- 
tion of Ro gives rise to K{R — 27) > e, which contradicts (4.2.6). Therefore, 
R < Rq must be satisfied. The proof of Bj^(£|X||X) < Rq is now completed. □ 

Now, consider the case that X and X are the stationary memoryless 
sources subject to Px and respectively. Since Khintchin’s law of large 
numbers implies that K{R) can be expressed as 

rOforO<i?<D^||V), 

^ ^ \lfor: R> D(X\\X), 

we obtain the following corollary from Theorem 4.2.1. 



282 4 Hypothesis Testing 



Corollary 4.2.1 (Stein’s lemma). 

Bf{e\X\\X) - D{X\\X) (0 < Ve < 1). (4.2.7) 



Example 4.2.1. Let us consider the case that X = {X is the station- 
ary memoryless source subject to a probability distribution P over X and 
X == the mixed source of Xi = and X2 = {-^ 2)^1 

the probability distribution given by 



■Px" (x) = aiPxi" (x) + a2Pxj (x). 

Here, Xi and X 2 are the stationary memoryless sources subject to probability 
distributions Pi and P 2 ? respectively. If we apply the property described in 
Remark 1.4.1 in §1.4 to 




PxAX^) 

p-x-m’ 



K{R) can be expressed as 




0 for 0 < P < D{Pi\\P), 
ai for D{Pi\\P) <R< D{P 2 \\P). 



Thus, P/(e|X||X), which is illustrated in Fig. 4.3, is dependent on 0 < £ < 1. 

□ 



Fig. 4.3. 



P/(e|X||X) 



D{P2\\P) 



D{Pi\\P) 




£ 



Example 4.2.2. In the example given in Example 4.1.3 in §4.1, Lemma 4.1.4 
guarantees 



[ _ dw{6) < K{R) < [ _ dw{0). 

J{e\D{Xo\\X)<R} J{e\D{X0\\X)<R} 



4.3 Strong Converse Theorem for Hypothesis Testing 283 



This shows that K{R) is a monotone increasing function of R. Therefore, the 
formula 



B/(£|X||X) 



= sup < R I 

I 



_ dw{Q) < e 
{9\D{Xo\\X)<R} 



(4.2.8) 



is obtained from Theorem 4.2.1. This function is also monotone increasing 
with respect to e. □ 



4.3 Strong Converse Theorem for Hypothesis Testing 

We also have the strong converse theorem on hypothesis testing corresponding 
to the strong converse theorems on source coding (§1.5), random number 
generation (§2.3) the channel coding (§3.5). 

Definition 4.3.1. Consider the hypothesis testing X against X and choose 
a rate R satisfying R > .B(X||X) (cf. Definition 4.1.2) arbitrarily. If 

lim fin = 1 holds for all acceptance regions An satisfying lim inf — log — > R, 

n-^oo n^cx) n An 

the hypothesis testing X against X is called to satisfy the strong converse 
property. □ 



Here, we define: 

Definition 4.3.2. 

1 . Pxr^(X^) 

Z)(X| |X) = p- hm sup - log , , 

n— >oo n P / 

and call D(X||X) the spectral sup- divergence rate ofK against X. 

Then, we have the following theorem. 

Theorem 4.3.1 (Strong converse theorem). The hypothesis testing X 
against X satisfies the strong converse property if and only if 

£(X||X)=S(X||X). 

Remark 4.3.1. This theorem means that the hypothesis testing satisfies the 
strong converse property if and only if the information-spectrum of the di- 
vergence density rate asymptotically becomes the one-point spectrum with a 
peak of probability one. □ 



Proof of Theorem 4-3.1. 

1 ) Sufficiency: 

Assume that D(X||X) = D(X||X). For an arbitrary constant 7 > 0 define 
Rhy 



284 4 Hypothesis Testing 



R = 5(X||X) + 37 = D{X\\X) + 37 (4.3.1) 

and consider an arbitrary hypothesis testing with an acceptance region An 
satisfying 

lim inf — log > R. 

n^oo n An 

Then, it follows that 

- log - 7 (Vn > no), 

n An 

which can be written as 



A„ < (Vn > no). (4.3.2) 

By noticing that = Pr {X” ^ An} and A„ = Pr |x" € An'^ and substi- 

tuting (4.3.2) into the inequality in Lemma 4.1.2, setting t = R — 27, we 
obtain 

+ (4.3,3) 

We notice here that (4.3.1) implies R — 2^ = D(X||X) + 7 due to the as- 
sumption of ^(X||X) = D(X||X). Therefore, we obtain 



lim Pr 

n—^00 



bog^'£j<fi 



n 






-27} = 



1 . 



Then, (4.3.3) guarantees liminf/i^ > 1, i.e., lim fin = 1. 

n^oo n— >00 



2) Necessity: 

Define R — 5(X||X) + 7 for an arbitrary constant 7 > 0. If we consider 
a hypothesis testing with an acceptance region An defined by 

Lemma 4.1.1 implies that i.e., 

lim inf — log -^ > R> B(X||X). 

n^oo n An 

Then, it follows from the assumption of the strong converse property that 
lim PrjX’^ ^ An} == lim fin = 1. 

n—^00 n— MX) 



By using (4.3.4), we obtain 



lim Pr 

n— MX) 






1 Cyn(X») 



0 , 



4.3 Strong Converse Theorem for Hypothesis Testing 285 



which leads to 

:D(X1 |X) <R = B{X\\X) + j^ D{X\ |X) + 7 . 

Since 7 > 0 is arbitrary, 

:D(X||X)<OT|X) 

is established. By noticing that T)(X||X) > D(X||X), the inequality in the 
opposite direction, always holds, D(X1|X) = £(X||X) is established. □ 



Remark 4.3.2. If the hypothesis testing satisfies the strong converse prop- 
erty, then R/(e|X||X) becomes a constant independent of 0 < £: < 1 (however, 
the converse is not always true). In particular, if both X and X are station- 
ary and memoryless, it is obvious that Khintchin’s law of large numbers 
guarantees that the strong converse property is satisfied. Therefore, Corol- 
lary 4.2.1 in the preceding section can also be obtained as a consequence of 
Theorem 4.3.1. □ 



Remark 4.3.3. For X = and X = we define T)(X||X) 

by 

£>(X||X) = liminf l£»(X”||X”) (4.3.5) 

n-^oo n 

and call D(X| |X) the inf- divergence rate of X with respect to X (in particular, 
D(X||X) is simply called the divergence rate if the right-hand side of (4.3.5) 
has a limit). Then, the inequality 

£(X||X) <D{X\\X.) (4.3.6) 

can be proved for an arbitrary alphabet similarly to the proof of Theo- 
rem 3.5.2 using Lemma 3.2.4 with X'^ and X instead of Un and Vn respec- 
tively. On the other hand, the inequality 

D{X\\X) <D{X\\X) (4.3.7) 

does not always hold even if Af is a finite alphabet. Therefore, the strong 
converse property does not always guarantee 

D{X\\X) = D(X||X) = :D(X1|X). (4.3.8) 



This fact means that the property corresponding to Corollary 1.7.1 on source 
coding and Corollary 3.5.1 on channel coding does not hold on hypothesis 
testing. □ 



286 4 Hypothesis Testing 



Though in Remark 4.3.3 above we have seen that the strong converse 
property does not always imply (4.3.8), we can make (4.3.8) true under a 
certain condition on sources X = and X = {X }^i- That is, for 

an arbitrary source alphabet X, if X and X are a stationary ergodic source 
and a stationary irreducible Markov source of finite order, respectively, then 

1, Px^(X-) 



the divergence density rate = — log _ 

n F- 

1 



converges almost surely 
1 



to lim —D{X^\\X^) (Barron [7]), which implies that —D{X'^\\X"') on the 

n—^oo 71 77 - 

right-hand side of (4.3.5) has a limit and satisfies (4.3.8). Therefore, the 
hypothesis testing X against X for such X and X satisfies the strong converse 
property, and hence, we have: 



Theorem 4.3.2. 

B/(ff|X||X) = lim -DiX^-WJT) (0 < Ve < 1). (4.3.9) 

n — >•00 7T, 

This theorem is regarded as a considerable generalization of the formulae 
(4.1.1) and (4.2.7) given in the preceding sections. 

Now, consider a special case that X is a finite source alphabet and X and 
X are stationary irreducible Markov sources of the first order with transition 
probabilities P(-|*) and P(-|-)? respectively. Then, the formula (4.3.9) yields 

%(£|X||X) = D{P\\P\p) (0 < Ve < 1), (4.3.10) 

where p denotes the stationary distribution of P and the conditional diver- 
gence is defined by 

D{P\\P\p) = ^p{x)D{P{-\x)\\P{-\x)). 

This is a natural generalization of Stein’s lemma (Corollary 4.2.1). 

Remark 4.3.4. By considering that the hypothesis testing X against X 
in (4.3.10) satisfies the strong converse property. Lemma 4.1.4 and Theo- 
rem 4.1.3 in §4.1 are generalized in the following way. If X and X^ are the 
stationary irreducible Markov sources of the first order with transition prob- 
abilities P(-|-) and respectively, then all of (4.1.11) and (4.1.12) in 

§4.1 and (4.2.8) in §4.2 hold, where D{Xe\\X) is replaced by the conditional 
divergence D{P 0 \\P\pe) (see Remark 1.4.4 in §1.4 of Chapter 1). □ 



4.4 Hypothesis Testing and Large Deviation Probability 
of Testing Error 

In §4.1 and §4.2 we have studied the hypothesis testing problems with the er- 
ror probability of the first kind fin converging to 0 or asymptotically bounded 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 



287 



by a constant 0 < £ < 1. In this section we consider the hypothesis testing 
with the error probability of the first kind fin required to asymptotically 
satisfy 



for a given constant r > 0. Under this constraint we would like to find the 
maximum of the exponent R > 0 when the error probability of the second 
kind An is expressed as An — Such a problem formulation means to 

simultaneously evaluate the large deviation behaviors of fin and An similarly 
to the analysis of the large deviation behaviors on the fixed-length source 
coding given in §1.9. The idea of the information-spectrum slicing plays an 
important role in this section as well as §1.9. 

First, we give two definitions. In this section as well, we denote the null hy- 
pothesis and the alternative hypothesis by X = and X = 

respectively. 

Definition 4.4.1. 

def 

Rate R is r-achievable 4=^ There exists an acceptance region An 

1 1 

satisfying lim inf ~ log — > r and 

n-^oo n fin 

lim inf — log > R. 

n-^oo n An 

Definition 4.4.2 (Supremum r-achievable error probability expo- 
nents). 

^e(r|X||X) == sup {jR I R is r-achievable} . 



The objective of this section is determining this Re(^|X||X) as a (left- 
continuous and monotone decreasing) function of r. To this end, we define 
r]{R) by 



rf{R) — lim inf — log 

n-^oo Tl 



Pr 



1 



1 . Pxr^jX^) 




(4.4.1) 



Though r]{R) is clearly a monotone decreasing function of jR, rf{R) is not 
continuous in general. 



Lemma 4.4.1. If R> D(X||X), then rj{R) = 0. 



Proof If R> D(XllX), then the definition of D(X||X) guarantees the exis- 
tence of some 0 < <So < 1 such that 



Pr 









for infinitely many n. Hence, 



> <^^0 



288 4 Hypothesis Testing 



ri(R) < liminf — log — = 0. 
n-^oo n £q 

_ □ 

Lemma 4.4.1 means that R < D(X||X) must be satisfied for r]{R) > 0. 

We have the following quite general theorem. 

Theorem 4.4.1 (Han [38]). For an arbitrary r > 0 

Be(r|X|lX) = inf{R + r]{R) \ rj{R) < r} , (4.4.2) 

rt 

where 5e(0|X||X) (r == 0) is defined as +oo. 

Remark 4.4.1. Note that rj{R) < r on the right-hand side of (4.4.2) is 
not rj{R) < r. There is an essential difference between these two as is clar- 
ified in the following proof. In addition, R + r]{R) > 0 is satisfied for all 
— (X) < R < +00 since r]{R) > —R is guaranteed from Lemma 3.2.1 in Chap- 
ter 3. □ 



Remark 4.4.2. Prom Lemma 4.4.1, we have 

inf _ {R + ry(R) I rj{R) < r} — inf _ R, 
i?>D(X||X) i?>D(X||X) 

where the infimum on the right-hand side is attained at R = £(X| |X). There- 
fore, inf on the right of (4.4.2) can be replaced with inf _ if r]{R) is 

R i?<D(X||X) 

continuous at R = £(X||X). □ 



Proof of Theorem 44- F 
1 ) Direct part: 

We use the following notation; 




5„(a) = |x 6 -t'" 


1 , Px"(x) ^ ) 


(4.4.3) 


Set 






R = inf {R 1 r]{R) < r} 


(4.4.4) 



and consider the hypothesis testing with the acceptance region 
T)? 



where 7 > 0 is an arbitrarily small constant. Then, the error probability of 
the first kind can be written as 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 289 



which leads to 

lim inf — log — = rj{R — 7). 

n-^00 n fin 

We notice here that (4.4.4) implies r]{R — j) >r. Therefore, 

(4.4.5) 

is established. Next, we evaluate the error probability of the second kind. Set 

(4.4.6) 



lim inf — log — > r 

n-^00 n flri 



Po = inf {R + ri{R) \ r]{R) < r} . 

Jx 

Let K be an arbitrarily large number satisfying K > po and define L = 
{K - R + 7)7(27). Denote by 

= + 1 ) 7 , E - 7 + 2 i 7 ] (i - 1, 2, . . . , L) (4.4.7) 

the L subintervals of the interval {R — j,K] each of which has the width 27. 
According to these subintervals we partition the set 



ro = { 



xe A'" 






into L subsets as follows: 



5(*) 






xe A’" 






n 'Py-(x) 
{information- spectrum slicing). In addition, set 



9(0) 

^ fn 



-{ 



X € A'" 



1 , Px" (x) ^ 

“ log ~5 — r\ > ^ 

n P^>(x) 



It is clear that 



Pn(P-7)= 



(4.4.8) 



i=0 

If we set 6^ = ^ — 7 + 2i^y for simplicity, (4.4.7) can be expressed as 
Ii = {bi-2jM] (i = l,2 ,-..,L). 

Since for i = 1, 2, • • • , L 



Pr{x"eS<«}<Pr{il„g|^<^}. 



it follows that 
lim inf — log 

n— >00 n 



1 



Pr{x" e 



(X") 
> v(bi)- 



Hence, we obtain 



290 4 Hypothesis Testing 



Pr |x" e 5^ } < (Vn > no). 

We notice here that, since x e Sn^ implies that 

we have the following inequality: 

P^n(x) < Px"(x)e-"(^^-2T'). 

Then, it follows from (4.4.9) that 

Pr{X" € 5^} < -Px"(x)e-"(*’‘-2'^) 



(4.4.9) 



xes; 



(i) 



< ^-n(bi+v(bi)-3j)^ 



(4.4.10) 



If we note here that bi > ^ + 7 for all i = 1, 2, • • • , L, we have 

bi + v{<}i)>Po (* = 1,2, ••■,L). 

By substituting this into (4.4.10), it holds that 

Pr{x" € 5 ^*)} < (i = 1,2,---,L). (4.4.11) 

On the other hand, by taking the fact that P^(x) < Px^^(x)e“^^ for x G 
into consideration, we have 

Pr{x”e5W}= ^ P^.(x) 






( 0 ) 



<e-”^ Y, PxA^) 






< e 



-nK 



(4.4.12) 



Then, (4.4.8), (4.4.11) and (4.4.12) lead to 



A, 



= Pr {V” e Sn{R - 7)} < 



We now obtain 

lim inf — log ^ > po ~ 37 

n^oo n An 

since K > po guarantees that jFC > po ~ 37 for 7 > 0. By noticing (4.4.5), we 
can conclude that po — 37 is r-achievable (note that 7 > 0 can be arbitrarily 
small). 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 



291 



2 ) Converse part: 

Let R and po be defined as in (4.4.4) and (4.4.6), respectively. Then, since 
t]{R) is monotone decreasing in R, there exists an Rq satisfying Ro>R and 

lim(i^o + ^ + t]{Rq + ^)) == Po- (4.4.13) 



Let us consider the set 






XG 



1 Px^' (x) 



where 7 > 0 is an arbitrarily small constant. Then, from the definition of 
t]{R)^ there exists some divergent sequence ni < ri 2 < • • • -^ 00 of integers 
such that 



Pr G So} > (Vj > jo), (4.4.14) 

where r > 0 is an arbitrarily small constant. We prove the converse part by 
the contradiction argument. To do so, assume that R = po 25 (5 > 0 is a 
fixed constant) is r-achievable, i.e., assume that there exists an acceptance 
region An satisfying 



lim inf — log — > r 
n^oo n Pn 


(4.4.15) 


and 




lim inf — log -^ > = po + 25. 

n^oo n An 


(4.4.16) 


Since x G 5q implies 




Pxn(x) < P^n(x)e"(^°+^\ 





we have 



Pr {X" € 5o n .4„} = Px" (x) 

x6Son.4„ 

yieSonArt 

< Y Px^i^) 

y.^An 

= (4.4.17) 

Furthermore, it follows from (4.4.16) that 
(Vn>no). 

Substitution of this into (4.4.17) yields 

Pr{X" € 5o n^„} < 

= g-n{po-Ro+2S-2'f) 



(4.4.18) 



292 4 Hypothesis Testing 



By virtue of (4.4.13), for any 7 > 0 small enough, 

Po > ^0 + 7 + + - S. 

Therefore, by (4.4.18) we have 
Pr{X^ G 5o n A} < 

If we choose 7* > 0 and 7 > 0 so small as to satisfy S > 2r + 7, then 

Pr{X^ G SoDAn} < (4.4.19) 

where r > 0 is the same one as in (4.4.14). On the other hand, by using 
(4.4.15), we obtain 

Pr {X^ e Son A^J < Pr {X^ G A^J 

= /^n < (Vn > no). (4.4.20) 

We observe here that t]{Rq -h 7) < r for all 7 > 0, and hence, for any suffi- 
ciently small r > 0, 

rj{Ro + 7) + 2r < r - r. 

Then, it follows from (4.4.19) and (4.4.20) that 

Pr {X^ G So} = Pr {X^ e So n An] + Pr {X^ e So n A^J 

< g-n(r;(i?o+7)+2r) g-n(r-r) 

< (4.4.21) 

for all n > no- However, since r > 0, (4.4.21) contradicts (4.4.14). Thus, the 
rate R = po 2S cannot be r-achievable. Since ^ > 0 is arbitrary, we can 
conclude that any R such that R> po cannot be r-achievable. □ 



Example 4.4.1. Let X = and X = be stationary mem- 

oryless sources subject to probability distributions P and P over a finite 
alphabet X, respectively, and consider the corresponding hypothesis testing. 
We first define the plane 

X! ‘3(a;)log£^ = i?| (4.4.22) 

in P(X), where V{X) denotes the set of all probability distributions over X. 
Denote by Pr the projection of P on kr in the sense of the divergence (see 
Example 1.9.1 in §1.9). Figure 4.4 illustrates such a situation. Then, Sanov’s 
theorem (cf. Dembo and Zeitouni [22]) tells us that rj{R) = 0 if R> D{P\\P) 
and t]{R) = D{Pr\\P) if R < D{P\\P). Here, note that D(X||X) = D{P\\P) 
from the law of large numbers. Thus, in (4.4.2) we have only to consider R 
satisfying R < D{P\\P) (see Remark 4.4.2). Since Pr is on hcr, Pr satisfies 
the equation 



^R = lQe V{X) 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 



293 




Fig. 4.4. 

which can be written as 

D(Pfl||P) - D{Pr\\P) = R. (4.4.23) 

Hence, we have 

R + r]iR) = D{Pn\\P). 

Here, note that (4.4.23) and the definition of Pr imply that 
P(P«||P)= inf DiQWP). 

QeKR 

Then, it follows from Theorem 4.4.1 that 
Pe(r |X| |X) = inf {R + r?(P) | v{R) < r} 

R 

= inf{P(Pfil|P)|P(Pfi||P)<r}, 
which can be immediately written as 

Be{r\X\\X)= inf D{Q\\P). (4.4.24) 

Q:D{Q\\P)<r ^ ^ 

This is nothing but Hoeffding’s theorem [50], well-known in statistics. This 
formula also implies that Ee(r|X||X) = 0 for all r > D{P\\P). □ 



Example 4.4.2. Let A' be a finite alphabet. Let us consider the case 
that the null hypothesis X = (Xi,X 2 ,---) and the alternative hypothe- 
sis X = (Xi,X 2 ,---) are first-order stationary irreducible Markov sources 
subject to transition probabilities P{x 2 \xi) = Pr {X 2 = a: 2 |Xi = xi} and 
P{x 2 \xi) = Pr {X 2 = X 2 \Xi = xi] for xi,X 2 G -T, respectively. Similarly to 
Example 1.9.2 in §1.9 of Chapter 1, we denote hy V{X x X) the set of all joint 
probability distributions over X x X and define the conditional divergences 
by 



294 4 Hypothesis Testing 



D{Q\\P\q) = q{xi)D{Q{-\x,)\\P{-\^i)), 

D{Q\\P\q) = ^ q{xi)D{Q{-\xMP{-\xi)) 

for any Q G P(A' x A'), where q{-) and Q(-|*) denote the marginal distribution 
and the conditional probability distribution defined by 



9(^1) = 'Yh 

X2G A' 



Q(X2\xi) = 



Q(xi,X2) 

q{xi) 



respectively. Then, from the argument using Sanov’s theorem for stationary 
irreducible Markov sources similarly to Example 4.4.1, we have r]{R) = 0 for 
R > D{P\\P\p) and 



t]{R) ^ D{Pr\\P\pr), (4.4.25) 

R + p{R) = D{Pr\\P\pr) (4.4.26) 

for R < D[P\\P\p). Here, p means the stationary distribution of P, Po de- 
notes the set of all probability distributions of Q G P(A' x A!) with the 

stationarity (see Example 1.9.2 in §1.9 of Chapter 1), Pr G Vq denotes the 
projection of P on the plane 






QeVo 



Y Q{xi,X2)\0g 

Xi^X2^^ 



P{X2\X1) 

P{x2\xi) 




(4.4.27) 



and Pr means the marginal distribution of Pr. We define the projection of 
P as the distribution Pr satisfying 



in_f D{Q\\P\q) = D{Pr\\P\pr), 

where q denotes the marginal distribution of Q. Hence, we obtain 

Be(r|X||X) = ini {DiPR\\P\pR) \ D{Pr\\P\pr) < r} 

inf D{Q\\P\q) (Vr > 0) 
QeVo-.D{Q\\P\q)<r 



(4.4.28) 



from Theorem 4.4.1 (cf. Natarajan [72]). This formula tells us that Pe(^|X| |X) 
= 0 for all r > P(P| |P|p), where p denotes the stationary distribution of P. □ 



Example 4.4.3. Let us generalize Example 4.4.2 above to the hypothesis 
testing for unijilar finite-state sources (see Example 1.9.3 in §1.9 of Chap- 
ter 1). To this end, let A' be a finite source alphabet and S a finite set of 
states. Let the null hypothesis X = {X^ = (Xi, • • • ,Xn)}^i be the unifilar 
finite-state source subject to 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 



295 



-Px"(x) = (x= € Af”) (4.4.29) 

1 = 1 

Si^i = f{xi, Si) {si e S] i = 1 , 2 ,- • ,n) ( 4 . 4 . 30 ) 

and the alternative hypothesis X = = (Xi, •, Xn)}^i the unifilar finite- 

state source subject to 

n 

P^™(x) = J][P( 2 :i|Si) {x = {xi,X 2 ,--- ,Xn) € ( 4 . 4 . 31 ) 

i=l 

Si +1 = f(xi,Si) (si e <S; i = 1,2, •••,n). (4.4.32) 

We now fix an initial state si G arbitrarily and denote by So the set of all 
states that can be reached from 5i with “positive probability” with respect 
to • Next, let = {X, S) be an arbitrary random variable taking values 
in A' X «So and define 

S' = f{X,S). ( 4 . 4 . 33 ) 

Furthermore, denote by Vo the set of all random variables XS satisfying both 
the stationary condition 

Ps'i-) = Ps{-) 

and the condition that the probability transition matrix P^'| 5 (-|-) is irre- 
ducible. Now, setting 






Pxs ^ Vo 



^ Pxs{x,s)log 



P(x|5) 

P{x\s) 




(4.4.34) 



we define the projection PxrSr ^ Vo of P(-|-) on the plane Xr by 



inf_ D{Pxs\\P\Ps) = D{Px,Sn\\P\Psn)^ 

PxS^^R 

Then, similarly to Example 4.4.2 Sanov’s theorem for unifilar finite-state 
sources (cf. Han [37]) yields 

V{R) = D{Px^s^\\P\Ps,) (4.4.35) 

R + rj{R) = D{PxkSr\\P\Psr)- (4.4.36) 



Thus, by substituting these equalities into Theorem 4.4.1, we obtain the 
following formula for the hypothesis testing for unifilar finite-state sources X 
against X: 



Be(r|X||X) 

= ini{D{PxRSR\\P\PsR) \D{PxrSr\\P\Psr) <r} 



inf 

Px s^Vo-D{Px s\\P\Ps)<r 



D{Pxs\\P\Ps) 



(Vr > 0), 



(4.4.37) 



296 4 Hypothesis Testing 



where Pxs and Ps denote the probability distributions of random variables 
XS and 5, respectively, and the conditional divergences are defined by 

= E Ps{s)D{Pxis{-\s)\\P{-\s)), (4.4.38) 

sG5o 

D{Pxs\\P\Ps) = E Psis)D{Pxis{-\s)\\Pi-\s)). (4.4.39) 

sG<So 

Recall here that, in general, every unifilar finite-state source is asymptoti- 
cally a mixed source of stationary (or periodic) irreducible sources (see Ex- 
ample 1.9.3 in §1.9 of Chapter 1). 

If we consider the hypothesis testing X against X for the unifilar finite- 
state sources X and X, it is easy to verify that the following formula on 
the supremum achievable error probability exponent R(X||X) (see Defini- 
tion 4.1.2) holds: 

R(X||X)= int D{Pxs\\P\Ps). (4.4.40) 

where Vo denotes the set of all random variables X5 G Vo satisfying the 
condition Px|S'(‘IO = ^ 



Example 4.4.4. Let A' be a finite alphabet and consider the mixed source 
X = and the stationary memoryless source X = {X subject 

to the probability distribution P given in Example 4.2.1. Recall that the 
mixed source X = is defined by 



Px- (x) = aiPx- (x) + a2Px^ (x) (Vx G X^) (4.4.41) 

for the two stationary memoryless sources Xi = X 2 = 

{X 2 subject to probability distributions P\ and P 2 , respectively. We 
define i^i and 1^2 by 

i^i = lQe V{X) 






(4.4.42) 






Q e P{X) 



E Q{P) log 



Pii^) 

P2{X) 




(4.4.43) 



as are defined in (1.9.34) and (1.9.35) in Example 1.9.4 in §1.9 of Chapter 1 
and two half-spaces in P(A’) by 



4'^ = g € p{x) 









(4.4.44) 






P{x) 



< R 



Q G V{X) 



(4.4.45) 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 



297 



where V{^) denotes the space of all probability distributions over A'. By 
taking (1.9.39) and (1.9.40) in Example 1.9.4 into account, Sanov’s theorem 
yields 

rj{R) = min(D(P^')||Pi),Z?(pf IIP 2 )), (4.4.46) 

where and P^R^ denote the projections of Pi and P 2 on i^i fl and 
(2) 

^2 n , respectively. If we substitute this r]{R) into the right-hand side of 
(4.4.2) in Theorem 4.4.1, we can compute values of Be(r|X||X) for the mixed 
source as a function of r. 

We note here that it easily follows from (4.4.46) that rj{R) = 0 for R 
satisfying R > min(D(Pi||P), T)(P 2 ||^)) siiid rj{R) is a monotone decreasing 
and continuous function of R. Therefore, 

Pe(r|X||X) < min(P(Pi||P),P(P 2 ||P)) (Vr > 0). (4.4.47) 

On the other hand, since r]{h) > 0 for any rate h satisfying h < min(P(Pi||P), 
P(P 2 ||P)) is verified from (4.4.46), we have 

inf {R + t]{R)\tj{R) < r]{h)} > /i, 

R 

which implies that h is T?(h)-achievable. Hence, it holds that 

limPe(r|X||X) = min(P(Pi||P),P(P 2 ||P)). (4.4.48) 

r|0 

Example 4.4.5. Example 4.4.4 can be generalized in the following way 
(suppose that A is a finite alphabet). First, denote by Xi = {^i}^=i 5 
X 2 = {X 2 , Xi = {X^}^^i and X 2 = the stationary memo- 

ryless sources subject to probability distributions Pi,P 2 ,Pi and P 2 , respec- 
tively. Consider the hypothesis testing with the mixed source X = 
defined by 



Px-(x) = aiPx-(x) -ha 2 Px-(x) {ai > 0,a2 > 0,ai -f- q 2 = 1) (4.4.49) 



as the null hypothesis and the mixed source X = {X defined by 



■Px" (^) = (^) + /^2px" (x) (A > 0, /?2 > 0, /3i + /?2 = 1) (4.4.50) 



as the alternative hypothesis. We define ui and R 2 by (4.4.42) and (4.4.43) in 
Example 4.4.4, respectively, and fii and p 2 by 






|q e V{X) 



^ Q{x) log 
xepc 



P2{X) 




(4.4.51) 



M2 



Q e V{X) 









(4.4.52) 



where P(A) denotes the space of all probability distributions over A. Fur- 
thermore, define the four half-spaces in P(A) by 



298 4 Hypothesis Testing 



^ Q € V{x) 



4"^ = {Qe V{X) 






Q G V{X) 



4 ^ = \q^ 






and denote by and the projections of Pi and P 2 on 

1^1 n ({m n 4 ^) u (m 2 n 4 ^ ’ 
j/2 n (^{fj.2 n 4 ^) u (mi n 4 ^)) , 

respectively. By applying Sanov’s theorem similarly to Example 4.4.4, we 
have 



(4.4.53) 

(4.4.54) 

(4.4.55) 

(4.4.56) 



V{R) = min(D(p(')||Pi),D(Pf IIP 2 )). (4.4.57) 

If this r]{R) is substituted into the right-hand side of (4.4.2) in Theorem 4.4.1, 
we can compute values of Pe(^|X||X) for the mixed sources as a function of 



We note here that it easily follows from (4.4.57) that rj{R) = 0 if 
P>min(P(Pi||Pi),P(Pi||P 2 ),P(P 2 ||Pi),P(^^ 
and r]{R) is a monotone decreasing and continuous function of R. Therefore, 
Pe(r|X||X) 

<min(P(Pi||Pi),P(Pi||P2),P(P2||Pi),P(P2||P2)) (Vr>0). 

(4.4.58) 

On the other hand, since r]{h) > 0 is obtained from (4.4.57) for any rate h 
satisfying 

0 < h< min(P(Pi||Pi),P(Pi||P2),P(P2||Pi),P(P2||P2)), 

it follows that 

inf {R + ? 7 (P)|? 7 (P) < T]{h)} > h, 

R 

which implies that h is 7?(/i)-achievable. Hence, it holds that 

limPe(r|X||X) = min(P(Pi||Pi),P(Pi||P2),P(P2|lPi),P(P2||P2)). 



(4.4.59) 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 299 



Remark 4.4.3. In fact, we can generalize Example 4.4.4 and Example 4.4.5 
in a much simpler way without computation of the information-spectrum. 
Let X be an arbitrary (not necessarily finite) alphabet and for four general 
sources Xi = {Xf }“ i, X 2 = {X 2 ”}~ 1 , Xj = 1 and ^ 

define X = as the mixed source of Xi and X 2 and X = {X 

as the mixed source of Xi and X 2 given by (4.4.49) and (4.4.50), respectively. 
Then, we have the following formula on the hypothesis testing X against X: 

Re(r|X||X)= min Be{r\Xi\\Xj) (Vr > 0). (4.4.60) 

l<i,j<2 

This formula is established in the following way. First, we arbitrarily choose 
four rates Rii,Ri 2 ,^ 2 i and R 22 satisfying 

< Be{r\X,\\Xj) (V2, j - 1,2). (4.4.61) 

Then, the definition of Be{r\Xi\\Xj) guarantees the existence of an accep- 
tance region of the hypothesis testing X^ against Xj satisfying 

1 1 

liminf — log — 7 — rr > r, (4.4.62) 

n^oo n 

liminf - log > Rij, (4.4.63) 

n^oo n 

where 

{i,j = 1,2). (4.4.64) 

We define the acceptance region An of the hypothesis testing X against X 

by 

An = n u n . 4 ^ 2 , 2 )^ _ (4.4.65) 

Since we have 

Mn = Px^{An) = aiPxf(dl^) + (X2Px"{An) 

<aiPxr((4'’'^n.4^2))c) 

+ a2Pxy((dll2d)n.4l2d))c) 

< aiPx;..((.Add))c) 

+ a2Pxj((dll2d))^) + a^Pxy ((.Al2-2))C), 

(4.4.62) and (4.4.64) guarantee 

lim inf — log — > r. 

n^oo n (In 

On the other hand, since 



(4.4.66) 



300 4 Hypothesis Testing 



An = = PlPx"^{An) "f" P2px^{An) 

we obtain from (4.4.63) and (4.4.64) that 

lim inf — log -^ > min Rij. (4.4.67) 

n-^oo n Xji l<ij<2 

By noticing that the rates i^n, i^i 2 , R 21 and R 22 are arbitrary as far as they 
satisfy (4.4.61), (4.4.67) means that 

lim inf — log > min Be(r|XJ|X^). (4.4.68) 

We can conclude from the combination of (4.4.66) and (4.4.68) that the right- 
hand side of (4.4.68) is r-achievable as a rate of the hypothesis testing X 
against X. That is, we have established the inequality 

Be(r|X||X)> min Be(r|X,||X,) (4.4.69) 

l<ij<2 

meaning the direct part. 

Next, to establish the inequality in the opposite direction, meaning the 
converse part, let R be an arbitrary r-achievable rate of the hypothesis testing 
X against X and denote by An its corresponding acceptance region. From 



the definition, we have 




lim inf — log — > r, 

n-^00 n i^n 


(4.4.70) 


lim inf — log > i?, 

n— ^00 n Xn 


(4.4.71) 


where 




fin — Px^'-{A^), Xn = Pj^n(^An)- 


(4.4.72) 



Now, consider the hypothesis testing X^ against Xj with this An as an ac- 
ceptance region and set 

= PxriA^n), AP = P^r.{An) {i,j = 1,2). 

Since 

Mn = o:iPxi^{An) +a2Px"M)i) 

= +Oi2l^n'^\ 



(4.4.73) 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 301 



it follows that 

,,(1,1) _ „{1,2) < tiL 

ai 






(2,1) _ (2,2) < Mn 

— f^n — 



Oi2 



Therefore, (4.4.70) guarantees 

liminf llog-^ > r (Vi,j = l,2). 
n^oo n 

In addition, by noticing that 



An — + P2px'^{An) 



(4.4.74) 



we have 



A( 1 ’ 1 ) = A( 2 . 1 ) < ^ 

5 

A(1,2) \(2,2) <• ^ 

« " - P 2 ' 

Hence, it holds from (4.4.71) that 

liminf llog-^ > i? (Vi,j = l,2). 

n— >00 n 

/'n 

Equations (4.4.74) and (4.4.75) mean that 
testings against X^- (z, j = 1,2). Thus, 

R < min Be(r|Xi||Xj). 

l<z,j<2 



(4.4.75) 

R is r-achievable for all hypothesis 

(4.4.76) 



We note here that (4.4.76) implies 
Se(r|X||X) < min Be(r|X,||X,) 



(4.4.77) 



since R is an arbitrary r-achievable rate of the hypothesis testing X against 
X. Now (4.4.60) follows from the combination of (4.4.69) with (4.4.77). 

Here, let us apply the formula (4.4.60) to Example 4.4.4 as a special case. 
Since Xi = X 2 = X, (4.4.60) can be written as 

^e(r|X||X) = min(.Be(rlXil|X), .Be(r|X 2 ||X)). 



By substituting the formula (4.4.24) in Example 4.4.1 into the right-hand 
side of the equation above, we obtain the following simple formula on the 
hypothesis testing for the mixed sources: 



Be(r|X||X) = min ( inf D{Q\\P), inf D{Q\\P) 

\Q:DiQ\\P,)<r Q:D(Q||P2)<r ^ 



(4.4.78) 



302 4 Hypothesis Testing 



Similarly, if we consider the case of Example 4.4.5, (4.4.60), together with 
the formula (4.4.24) in Example 4.4.1, yields the following simple formula on 
the hypothesis testing for the mixed sources: 

5e(r|X||X)- min inf D{Q\\Pj). (4.4.79) 

In addition, we also have the following formula of .B(X||X) (see Defini- 
tion 4.1.1 and Definition 4.1.2): 

B(X||X) = min B(X,||X,) 

= imm2£(Xi||Xj), (4.4.80) 

which can be verified by using the argument yielding the formula (4.4.60) 
(see Theorem 4.1.1 and Example 4.1.1). □ 



Example 4.4.6. In the mixed hypothesis testing given in Remark 4.4.3, sup- 
that Xi = {Xf}~^i,X 2 = Xi = and X 2 = 

{X 2 }^i are first-order stationary irreducible Markov sources subject to tran- 
sition probabilities Pi (-|*)5 P 2 {'\'), ^i(-|-) and respectively (assume 

that A' is a finite source alphabet). In this case, by substituting (4.4.28) 
in Example 4.4.2 into the formula (4.4.60) in Remark 4.4.3, we obtain the 
following formula on the mixed hypothesis testing X against X: 

Re(r|X||X)= min inf D{Q\\Pj\q) (Vr > 0). (4.4.81) 

l<iJ<2QeVo:D{Q\\Pi\q)<r ^ ^ J \ J 



Example 4.4.7. Let us consider here the case that ^ is a countably in- 
finite alphabet, say M = {1,2, •••}. Then, we can use Cramer’s Theorem 
(cf. Dembo and Zeitouni [22]), which always holds, although Sanov’s theo- 
rem used in Example 4.4.1 and Example 4.4.2 does not always hold. First, 
let P — (pi,P2, ■ ■ •) 8ind P = (Pi,P 25 ' • ■) be two arbitrary probability dis- 
tributions over X and denote by X and X the random variables that are 
equal to k with probabilities pk and p^ {k = 1,2,---), respectively. Let 
X = {X” = (Xi,X 2 , • • ■,Xn)}Zi and X = {X” = (Xi,^, • • • ,X„)}~ ^ 
be the stationary memoryless sources specified by X and X, respectively. 
Since the divergence density rate can be written as 



1 PxA^n 



1 A, Px, 



X J ’i=i 

7]{R) in (4.4.1) can be expressed as 
t]{R) = inf /(x), 

x<R 



PxAXi) 

Pv 



{XiY 



(4.4.82) 



(4.4.83) 



where I{x) denotes the large deviation rate function of (4.4.82). If we notice 

P (X\ 

here that the moment generating function M{6) of log — - is written as 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 



303 




i=l 



oo 




(4.4.84) 



i=l 



Cramer’s theorem tells that the rate function I{x) is given by 



I{x) = sup{9x — A{6))^ 



(4.4.85) 



e 



where A{9) = \ogM{9) and — logM(^) is usually called Chernoff’s 9 -distance 
(cf. Blahut [11], Cover and Thomas [17]). Here, since the expectation of 



we notice from (4.4.83) that r]{R) — 0 for > D{P\\P) and rj{R) — I{R) 
for R < D{P\\P) (I{x) is monotone increasing for x > D{P\\P), monotone 
decreasing for x < D{P\\P) and I{x) = 0 at x = D{P\\P)). Therefore, 



Thus, we can compute I{R) by using this equation. To this end, we differen- 
tiate the terms on the right-hand side with respect to 9 and set it to 0. Then, 
we have the following equation with respect to 9: 



As far as P 7 ^ P is satisfied, (p{9) on the right-hand side turns out to be 
a continuous and strictly monotone increasing function of 9 owing to the 
term by term differentiability of M{9) (cf. Dembo and Zeitouni [22]), which 
is easily verified by using the Schwarz inequality (cf. Gallager [30]). If we 
define V = {— oo < (p{9) < +oo | ^}, then V forms an interval on the real 
line. Consequently, if P G P, /(P) can be computed as 




E log = Tfilog^ = -D(P||P) (divergence), 





we obtain the formula for computing values of Pe(^|X||X) by substituting 
(4.4.83) into (4.4.2) in Theorem 4.4.1. 



If (4.4.84) is substituted into (4.4.85) with x = P, we obtain 



CXD 




i=l 



oo 



R = 




= (p{6). 



(4.4.87) 



OO 




OO 



i{R) = dR- log 



(4.4.88) 



304 4 Hypothesis Testing 



where d is determined by (4.4.87). In this case, denoting by V{X) the set of 
all probability distributions over X and Or the projection of P on the plane 
in V{X) defined by 



€ V{X) 

we can verify by direct computation that 



i=i ) 



I{R) = D{Qn\\P) 



(4.4.89) 



and 

Qnii) 






(i G X) 



(4.4.90) 



with 9 satisfying (4.4.87). That is, if i? G T>, then Cramer’s theorem is reduced 
to Sanov’s theorem as in Example 4.4.1 for a finite alphabet case. However, 
equality such as (4.4.89) does not hold for R satisfying R^T>. Therefore, it 
is important to know what kind of interval T> forms. In particular, if 



D{P\\P) < +00, D{P\\P) < +0O, 



(4.4.91) 



then 



[-D{P\\P),D{P\\P)]C'D. 

Thus, in this case we obtain 

Be{r\X\\X)= inf D{Q\\P) (4.4.92) 

Q:D{Q\\P)<r 

for 0 < r < D{P\\P) from Sanov’s theorem in the same way as in Exam- 
ple 4.4.1 (see Fig. 4.5). This equation clearly holds for r > D{P\\P)] we 



Fig. 4.5. 




have Be(r|X||X) = 0 in this case. The formula (4.4.92) is regarded as an 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 305 

extension of Hoeffding’s theorem in (4.4.24) for the case of a finite alphabet 
to the case with a countably infinite alphabet JY. In fact, the formula (4.4.92) 
holds in general with any infinite alphabet A" that is not necessarily countably 
infinite under the condition (4.4.91). This fact follows from the fact that, if 
we rewrite , appearing in the proof above, in the equivalent form of 

^ ^ is well-defined as the Radon-Nikodym derivatives (cf. Billingsley 

[9]) for any infinite alphabet A'. We note here that the condition (4.4.91) 
is equivalent to the condition that the probability measure P is absolutely 
continuous with respect to the probability measure P and, conversely, P is 
absolutely continuous with respect to P. 

Cramer type equivalent of the formula (4.4.92) under the condition 
(4.4.91) is found in Dembo and Zeitouni [22] where the Neyman-Pearson 
lemma is directly invoked, while here Theorem 4.4.1 is invoked. □ 

Example 4.4.8. Let us consider the hypothesis testing for autoregressive 
processes given in Example 1.9.9 in §1.9. Let the null hypothesis X = 
= (Xi,X2 , . . . ,X^)}^i and the alternative hypothesis X = {X^ = 
(Xi, X 2 , . . . , X^)} be the autoregressive processes defined by 

Xn = aXn-1 + Wn (0 < u < 1; n = 1, 2, • • •), 

Xn = aXn-1 4- Wn (0 < a < 1; n = 1, 2, • • •), 

respectively, where 



are the stationary memoryless sources subject to probability distributions Pw 
and Pjy over the same alphabet W, respectively, and we define Xq = Xq = 0. 
The alphabet W can be any countably infinite set, continuous set or even 
any subset of real numbers. Then, since there is a one-to-one correspondence 
between the two divergence density rates 



W = (Wi,W2,---) (TE"-(lEi,.-.,W,)), 
W = (Wi,W2,---) (Pr-(iEi,---,Wn)) 




and 





, 1 ^ Pwr^iW^) 

spectrum of — log 777^^. Iheretore, it 



coincides with the information- 



306 4 Hypothesis Testing 



are satisfied, from the formula (4.4.92) developed in Example 4.4.7 we obtain 
the following formula on the hypothesis testing for autoregressive sources: 



Be{r\X\\X) 



inf D{Q\\P^). 

Q:D{Q\\Pw)<r 



(4.4.93) 



Example 4.4.9. Let us consider here the case that the null hypothesis X 
and the alternative hypothesis X are the stationary memoryless sources sub- 
ject to Gaussian distributions N{hi, a^) and N(k, cr^), respectively. We denote 
the probability density functions by 



= 

P-k(x) = 



1 



-(— )! 






\/^o 



(a; — K.) 






Denote by X the random variable subject to the probability density function 
P^. Since the moment generating function M{0) = E(e^^) of 



y-iog 



P^{X) 



Pn{X) 

is computed as 



(4.4.94) 



M{0) = e ^ 

we have 

6x — log M{6) = 9x — 



2ct2 



Then, simple computation tells us that the large deviation rate function I{x) 
for (4.4.94) can be written as 

I{x) = sup(6ix - log M(6»)) = , (4.4.95) 

ff^ - 

where we set a — — ^ — for simplicity. In addition, we note that D{P^\\Pj^) 

= a. Now, Cramer’s theorem implies that rj{R) in Theorem 4.4.1 can be 
computed as 

/ON • f r/ \ 

t]{R) = mf I{x) = mm 

^ ^ x<R ^ ^ x<R 2{k - rY 



• fr cr^(P — <^)^1 



(4.4.96) 



Furthermore, since 



4.4 Hypothesis Testing and Large Deviation Probability of Testing Error 307 



R + r]{R) = min + [a - R] + , R + -^2 } 

= min + [a - R]\ , (4.4.97) 

the substitution of (4.4.96) and (4.4.97) into the right-hand side of (4.4.2) in 
Theorem 4.4.1 and a little computation lead to 

Be(r|X||X) =min{[o-r]+,(Vr- Vo)^} 

= iVr-Vaf^r < a], 



where 1[ • ] denotes the characteristic function (Fig 4.6). This formula tells 



Be{r\X\\X) 




Fig. 4.6. 



us that .Be(r|X||X) is monotone decreasing for 0 < r < a. It also tells us that 
5e(0|X||X) = a = D{P^\\Pj^) and ^e(r|X||X) - 0 for r > a. □ 



Example 4.4.10. In all the examples that we have given so far, the function 
rj{R) is continuous in R. However, we can construct an example where r]{R) 
is not continuous in the following way. Let the source alphabet be .T == {0, 1} 
and fix a subset Sn of satisfying \Sn\ = 2^'^ arbitrarily, where a is a 
constant satisfying 0 < a < 1. We also arbitrarily fix Xq and Xi satisfying 
xo,xi G — Sn and xq ^ xi. Now, we define the null hypothesis X = 
{X"}~ 1 as 



Px"(x) 



2~2an foj. xe 5 „, 

2~3an x = Xi, 

< 1 _ _ 2-3“" for X = xo, 

^ 0 for X ^ 5„ U {xi,xo}, 



(4.4.98) 



which clearly satisfies Px^^{Sn) — 2 We define the alternative hypothesis 
X = P^rr(x) = 2~'^ (Vx G A'’^). Then, simple computation 



308 4 Hypothesis Testing 



tells us that the divergence-spectrum becomes the three-point spectrum with 
three peaks of probabilities i and 2“^"^ at 1-h^ log(l — 

2 -ocn _ 2-3o:n^^ 1 — 2a and 1 — 3a, respectively. Thus, rj{R) is computed from 
its definition as follows: 



ri{R) = < 



-foo for jR < 1 — 3a, 

3a for 1 — 3a < jR < 1 — 2a, 
a for 1 — 2a < -R < 1, 

0 for 1 < R. 



(4.4.99) 



Then, R + ? 7 (R) is expressed as 



R + ^(R) 



-foo for R < 1 — 3a, 

R -f 3a for 1 — 3a<R<l — 2a, 
^ R + a for 1 — 2a < R < 1 , 

R for 1 < R. 

\ — 



By using Theorem 4.4.1, we obtain the following formula: 



Be 




1 — a for r > a, 

1 for 0 < r < a. 



(4.4.100) 



(4.4.101) 



We note here that, if r > a, inf on the right-hand side of (4.4.2) is attained 
by R == R° = 1 — 2a as 



inf {R 4- v{B) I r]{R) < r} - R° -f- (R° - 1 - 2a) 

Ft 

— 1 — a. 



In particular, if r > 3a, inf is not attained by the boundary point R = 

ft 

ini{R\rj{R) < r} — 1 — 3a of {R|? 7 (R) < r}, but attained by the internal 
point R = R° = 1 — 2a. This kind of phenomenon only occurs in the hy- 
pothesis testing treating general sources that do not satisfy the consistency 
condition, i.e., such a phenomenon never occurs as far as we treat ordinary 
sources given in the preceding examples. □ 



4.5 Hypothesis Testing and Large Deviation: 

Probability of Correct Testing 

In §4.4 we have considered the large deviation behavior of the error probabil- 
ity of the second kind subject to the constraint that the error probability 
of the first kind jin asymptotically satisfies fin ^ for a constant r > 0. 

However, comes to satisfy A^ — 1 if r > 0 is sufficiently large (see Ex- 
ample 4.4.1). In this kind of situation it is important to investigate the large 
deviation behavior of 1 — A^, the probability of correct testing against the 
alternative hypothesis, instead of the error probability A^ itself. The prob- 
lem to be considered is how the exponent R > 0 can be small when 1 — A^^ is 



4.5 Hypothesis Testing and Large Deviation: Probability of Correct Testing 309 



expressed as 1 — This section is devoted to analysis of this prob- 

lem. We begin with formulation of this problem. The null hypothesis and the 
alternative hypothesis are denoted by X = and X = in 

this section as well. 



Definition 4.5.1. 

Rate R is r- achievable 



There exists an acceptance region An 
satisfying lim inf — log — > r and 

n->oo n jin 



lim sup — log — < R. 

n—^oo ^ i Att, 



Definition 4.5.2 (infimum r-achievable correct probability expo- 
nent). 

5*(r|X||X) = inf {R | R is r-achievable} . 



The objective of this section is determination of this 5*(r|X||X) as a 
(left-continuous and monotone increasing) function of r. To this end, let us 
define a function r]{R) by 



i „ ri, Px.(x-) „v 

This function is in the same form as r]{R) in (4.4.1) in the preceding section. 
However, t){R) in (4.5.1) is different from r]{R) in (4.4.1) in the sense that 
the existence of the limit on the right-hand side of (4.5.1) is assumed. We 
also note that Lemma 4.4.1 implies that r){R) is monotone decreasing in R 
and 7]{R) = 0fov R> D(X||X). 

We give one assumption on the information-spectrum. That is, we assume 
that for any constant M > 0 there exists some sufficiently large constant 
K > 0 satisfying 



lim inf — log — 

n-^oo Ti 

Pr 



1 



-log 

n 



Px-jX ) 
PxA'^) 




>M. 



(4.5.2) 



Remark 4.5.1. This assumption means that the information-spectrum of 
X with respect to X does not shift to the right more than some specified 
speed as n increases. For example, if X and X are the stationary memoryless 
sources subject to probability distributions Px and over a finite alphabet 
A', respectively, and there is no x G satisfying Px{x) = 0 and P;^(x) > 0, 
the assumption (4.5.2) trivially holds. The assumption (4.5.2) holds for other 
stationary sources that we usually treat. □ 



310 4 Hypothesis Testing 



We have the following theorem that is a dual counterpart of Theo- 
rem 4.4.1. 

Theorem 4.5.1 (Han [38]). Assume that (4.5.2) holds. Then, for any r > 0 
B:(r|X||X) = inf {i? + r,{R) + [r - r,(R)]+} , (4.5.3) 

tl 

where H*(0|X||X) (r — 0) is defined as 0. 

Remark 4.5.2. Lemma 4.4.1 implies that 

inf _ {R-\- t]{R) + [r - rj{R)]~^} = inf _ (R 4- r), 

i?>D(X|lX) i?>D(X||X) 

where the infimum on the right-hand side is attained at R = ^(X||X). 

Hence, inf on the right-hand side of (4.5.3) can be replaced with inf _ 
^ i?<D(X||X) 

if t]{R) is continuous at R = D(X||X). Recently, another general expression 
for R*(r|X||X) was given by Nagaoka and Hayashi [69]. □ 



Proof of Theorem 4-5.1. 

1) Direct part: 

In the proof of the direct part we do not need the assumption (4.5.2). 
First, keep in mind that rj{R) in 

R-\-rj{R) + [r -rj{R)]'^ 

on the right-hand side of (4.5.3) is monotone decreasing, and set 

Po =mf{R + T]{R) + [r-r]{R)] + }. (4.5.4) 

Jrl 

Then, there exists an Rq such that Pq is expressed as 

Po = + e + 7 ]{Ro + £) + [r - rj{Ro + e)]'*'), (4.5.5) 

which we rewrite as 

Po= Ro+J + v{Ro + 7) + - v{Ro + 7)]'^ - 1^(7), (4.5.6) 

where 7 > 0 is an arbitrarily small constant and z/(7) ^ 0 as 7 0. We use 

here the notation that 



5: (a) = 



1 Px"(x) ^ 

- log < a 



n 



P^n(x) 



(4.5.7) 



Then, since the existence of the limit in (4.5.1) is assumed, we have 

g-n(^(fio+7)+r) < Pr{x" g S*^{Ro + 7)} < (Vn > no), 

(4.5.8) 



4.5 Hypothesis Testing and Large Deviation: Probability of Correct Testing 311 



where r > 0 is an arbitrarily small constant. Next, define a subset Cn of 
S^{Ro + 7) as follows; if rj{Ro + 7) > r then set Cn = 5*(i^o + 7)5 otherwise 
set Cn =Tn, where Tn is an arbitrary subset of S*{Rq + 7) satisfying 



lim — log 

n— >00 77, 



1 

Fi{X^eTn} 



= r. 



(4.5.9) 



It should be noted here that it is always possible to choose such a subset T^, 
because in the case with rj{Ro + 7) < r we can make rj{Ro + 7) + r < r hold 
with a r > 0 small enough, where we may consider a randomized hypothesis 
testing if necessary. Now, consider the hypothesis testing with Cn as the 
critical region. First, we evaluate the error probability of the first kind fin- 
In the case with rj{Ro +7) > r, since Cn = S*{Ro 4- 7), by means of (4.5.8) 
we have 



Pr{X" e Cn} < 

<e-n(r-r) (Vn > no), 

while in the case where t]{Rq +7) < r, by means of (4.5.9) we have 
Pr {X" e Cn] < (Vn > no). 



Then, in either case, it holds that 

Pr {X” € Cn} < (4.5.10) 

Therefore, the error probability of the first kind fin is evaluated as 
/x„ = Pr{X"GC„}<e-”('-"). 



Hence, 

lim inf — log — > r — r. 

n— >00 n fin 

Since r > 0 is arbitrary, we can conclude that 

lim inf — log — > r. (4.5.11) 

n— >00 n fin 

Next, we evaluate 1 — A^, the probability of correct testing, where is the 
error probability of the second kind. First, we observe that if x G S*{Ro + 7) 
then 

p^.(x) > Pxn(x)e-"(^°+^> (4.5.12) 

holds. Then, in the case of tj{Rq -f 7) > r, since Cn = S*{Ro + 7), it follows 
from (4.5.8) that 

Pr{x"ec„}= 

yiec^,. 

> Px"(x)e-"(^°+^) 

= e-"(««+'>')Pr {X" e S*{Ro + 7)} 

> ^-n{Ro+l+n{Ro+-y)+r) 



(4.5.13) 



312 4 Hypothesis Testing 



Similarly, in the case of rj{Ro + 7) < r, since Cn = Tn, it follows from (4.5.9) 
that 

Pr {x” e C„} > (Vn > no). (4.5.14) 

Summarizing (4.5.13) and (4.5.14), in either case we have 

Pr G Cn| > Q-ri{Ro-\-J+'n(Ro-\-j)-\-[r-r}(Ro-\-'y)] + -\-T) ^ (4.5.15) 

Substitution of (4.5.6) into (4.5.15) yields 
Pr eCn}> 

Hence, 

1-A„ =Pr [JT eCn} 

> g-’^(pS+T + i/(7))^ 

from which it follows that 

limsup — log - — < Po^r ^/(y). (4.5.16) 

n— >oo R 1 

We notice here that we can make — > 0, because r > 0 and 7 > 0 are 

both made arbitrarily small. Thus, by virtue of (4.5.11) and (4.5.16) we can 
conclude that any rate R satisfying R> Pq is r- achievable. 



2) Converse part: 

In the proof of the converse part we need the assumption (4.5.2). First, 
let K > 0 be a large enough constant (to be specified below) and 7 > 0 be an 
arbitrarily small constant. Putting L = we divide the interval {—K^K] 
into L subintervals with equal width 7 to have 



h = {ci--^,Ci\ (i = 1,2, •••,L), 

where Ci = K — {% — 1)^. According to this interval partition, divide the set 



T* = |x G 
into the L subsets 
= |x€ X’ 



1 Px-(x) 
-K <- log - — ^ 
n Py-W 



<k} 



1 Px"(x) 
- log ^ 
n P- 



X“ 



€li\ (i = 1,2,. •.,!). 



This operation is called the information-spectrum slicing. Moreover, we define 

= <! X € X" 



5(-i) 



4 



xe X" 



- log < -K 

n P^"(x) 

1 Px"(x) 
n °®P^n(x) 



}■ 



4.5 Hypothesis Testing and Large Deviation: Probability of Correct Testing 313 



where it is obvious that == U^=-i ' Suppose that R is r-achievable, 

i.e., suppose that there exists a critical region Cn such that 

liminf — log — > r, (4.5.17) 

n— ^CX) n fin 

limsup — log - — < R. (4.5.18) 

n—^oo t 

Then, from (4.5.17) we have 

A^n < (Vn > no), (4.5.19) 

where r > 0 is an arbitrarily small constant. In order to evaluate the value 
of Pr C Cn|, let us first evaluate the value of 

where ^ ^nC„ {i = — 1, 0, 1, 2, • • • , L). We now evaluate Pr |w" € ^ | 

(2 = 1, 2, • • • , L) in two ways as follows. First, we observe that 

Pr {X” € } < Pr {X" € Cn} = Mn, 

which, together with (4.5.19), yields 

Pr |X” € CWj < (4.5.20) 

Next, by the definitions of rj{ci) and Sn \ we see that 
Pr{X»e4‘>}<Pr{bog|^<c.} 

< (Vn > no). 

Hence, 

Pr |X" € Ci*)} < Pr |X" € 

< (4.5.21) 

A consequence of (4.5.20) and (4.5.21) is 

Pr { G (i = 1, 2, . . • , L). (4.5.22) 

We can now evaluate Pr G Cn ^ | as follows. Since x G Cn ^ implies x G 
(^ = 1? 2, • • • , L) and hence also P^"^(x) < we have 



314 4 Hypothesis Testing 



Pr{x"eC«}= x; P^n(x) 
x€Ci‘) 

xeci‘> 

< g-n(ci+r?(ci) + [r-r/(ci)]+-7-r) (4.5.23) 

for z 1, 2, • • • , L, where we have used (4.5.22) in the last inequality. Further- 
more, let us evaluate Pr G and Pr e 5n°^|. Since x G 

implies P^r.(x) < Pxr> , we obtain 

Pr{X”e5(-i)}= ^ p^.(x) 

x€Si-‘> 

< (4.5.24) 



Recalling here that 




and noting the assumption (4.5.2), we see that for any M > 0 there exists a 
RT > 0 large enough such that 

Pr |X" € 5^0) } < (Vn > no). (4.5.25) 

Summarizing (4.5.23)-(4.5.25), we have 



1-A„ 

= Pr {X” € C„} = Pr {X" e C«} 

i=-l 

L 

< ^ g-n(ci+r7(ci) + [r-?7(ci)] + -7-r) g-niC _|_g-n(M-r)^ 

i=l 

On the other hand, since, by the definition (4.5.4) of Po? 

Ci + v{ci) + [r - r?(ci)]+ > (i = 1, 2, • • • , L), 
it follows from (4.5.26) that 



4.5 Hypothesis Testing and Large Deviation: Probability of Correct Testing 315 



1 - A„ < 



Thus, if we take M > 0 and K > 0 large enough, then 
limsup - log ■■ -L > pI-1 -T. 

n— >cx) t 



(4.5.27) 



Therefore, R> pQ — 'y — r holds, owing to (4.5.18) and (4.5.27). Since both of 
7 > 0 and r > 0 are arbitrary, we can let 7 0 and r ^ 0 to obtain R> pQ. 

Thus, we can conclude that any r- achievable rate R cannot be smaller than 
Po- □ 



Example 4.5.1. Suppose that T is a finite source alphabet. Let the null 
hypothesis X and the alternative hypothesis X be the stationary memoryless 
sources subject to probability distributions P and P, respectively. Here, for 
simplicity, we assume that P(x) > 0 for all x G rT. which is the case that the 
assumption (4.5.2) is satisfied. As is shown in Example 4.4.1, in this setting 
t]{R) = 0 fov R> P(P||P) and 

rj{R) = D{Pr\\P), 

R + t]{R) = D{Pr\\P) 

for R < P(P||P), where Pr denotes the projection of P on the plane kr 
defined in (4.4.22). We note here that, since P(X||X) = P(P||P), it suf- 
fices to consider R satisfying R < D{P\\P) in (4.5.3) of Theorem 4.5.1 (see 
Remark 4.5.2). Then, since we have 

R + v(R) + [r - v(R)]+ = D(Pfi||P) + [r - P(Pr||P)]+, 

Theorem 4.5.1 yields 

P:(r|X||X) = inf {P(Pr||P) + [r - D{Pr\\P)]+} . (4.5.28) 

This formula indicates that P*(r|X||X) is a monotone increasing function of 
r. In addition, while P*(r|X||X) = 0 for the case of r < P(P||P), P*(r|X||X) 
can be expressed as 

P:(r|X||X) = inf {P(P«||P) + r - P(Ph||P)} 

for the case of r > D{P\\P) because it is easily verified that inf on the right- 

hand side of (4.5.28) is achieved by R satisfying D{Pr\\P) < r (Fig. 4.7). This 
formula is the same as the formula first developed by Han and Kobayashi [43] 
based on the argument of types. □ 



316 



4 Hypothesis Testing 




Fig. 4.7. 



Example 4.5.2. Let us consider the hypothesis testing for first-order sta- 
tionary irreducible Markov sources X and X with a finite alphabet that are 
considered in Example 4.4.2. We use the notation that appeared in Exam- 
ple 4.4.2. From (4.4.25) and (4.4.26), we obtain rj{R) = 0 for > D{P\\P\p) 
and 

77(i^) = D{Pr\\P\pr), 

R + t]{R) = D{Pr\\P\pr), 
for R < D{P\\P\p). Then, Theorem 4.5.1 yields 

B:(r|X||X) = inf {D(Ph||PK) + [r - D{Pr\\P\pr)]+} . 

Since it is easy to verify that, if r > D(P||P|p), inf is attained by R satisfying 

R 

D{PR\\P\pR)<r, 
we obtain 

P:(r|X||X) = inf {D{Pr\\P\pr) +r - D{Pr\\P\pr)} 

R:D{PR\\P\pR)<r 

inf {D{Q\\P\q)+r-D{Q\\P\q)}, (4.5.30) 

QeVo:D{Q\\P\q)<r ^ ^ v ii i /j 

where p denotes the stationary distribution of P (cf. Nakagawa and Kanaya 
[71]). In addition, we can check that P*(r|X||X) = 0 for all r < D{P\\P\p). □ 



Example 4.5.3. In order to generalize Example 4.5.2 above, let us consider 
the hypothesis testing with unifilar finite-state sources X and X given in 
Example 4.4.3 in §4.4. We use the same notations used in Example 4.4.3. 
Since (4.4.35) and (4.4.36) hold from Sanov’s Theorem on unifilar finite-state 
sources. Theorem 4.5.1 leads to the following formula of P*(r|X||X) for the 
hypothesis testing X against X: 



4.5 Hypothesis Testing and Large Deviation: Probability of Correct Testing 317 



S:(r-|X||X) 

= inf [DiPx^Sn\\P\Psn) + [r-D {PxnsJ\P\Ps^)f} 

= inf {D{Pxs\\P\Ps) + [r-D{Pxs\\P\Ps)f}. (4.5.31) 

Pxs^Vo t J 

Example 4.5.4. Let us consider the hypothesis testing for autoregressive 
processes with a finite alphabet W treated in Example 4.4.8 in §4.4. If r > 
D{P^\\Pw)^ then the result of Example 4.5.1 yields the following formula: 

B:(r|X||X)= inf {D{Q\\P^)+t-D{Q\\Pw)}. (4.5.32) 

Q:D{Q\\Pw)<r 

Here, H*(r|X||X) = 0 for r < D{P^\\Pw). □ 



Example 4.5.5. Let A' be a finite alphabet. Consider the mixed source X = 
and the stationary memoryless source X = {X subject to a 

probability distribution P given in Example 4.2.1. We assume that Pi{x) > 0 
and P 2 {^) > 0 are satisfied for all x G A in order to meet the assumption 
(4.5.2). Recall that the mixed source X = is defined as 

Px^(x) - aiPxr (x) + a 2 Px-(x) (Vx G A"), (4.5.33) 

where Xi = and X 2 = {^ 2)^1 denote the stationary memoryless 

sources subject to probability distributions Pi and P 2 , respectively. Now, we 
define ui and 1^2 by (4.4.42) and (4.4.43) in Example 4.4.4, respectively, and 
the half-spaces and in P(A) by 



g e P{X) 



={Q& P{X) 



E.,., .0.^.4 



(4.5.34) 

(4.5.35) 



where V{X) denotes the set of all probability distributions over X. Denote by 
p]r^ and the projections of P\ and P 2 on Pi Pi and 1^2 P , respec- 
tively. By taking (1.9.39) and (1.9.40) in Example 1.9.4 into consideration 
and applying Sanov’s theorem, we obtain 

rj{R) = min(D(pW||Pi),D(Pf IIP 2 )). (4.5.36) 

This formula indicates that r]{R) = 0 ii R > min(P(Pi||P), P(P 2 ||^)) and 
? 7 (P) is a continuous and monotone decreasing function of R. By substituting 
this t]{R) into the right-hand side of (4.5.3) in Theorem 4.5.1, we can com- 
pute values of P*(r|X||X) as a function of r. □ 



318 4 Hypothesis Testing 



Example 4.5.6. Let us consider the hypothesis testing for the mixed sources 
given in Example 4.4.5. First, we define z^i, and /i 2 as in the same way 



( 1 ) _ 



K,jy 



Kjy 



.(3) 



.( 4 ) 



( 2 ) _ 



4.5 and the half-spaces 


> 


eV{X) 


E 

xex 


Q{x) log 


Pi{x) 

Pi{x) 


< R 


' € V{X) 


E 


Q{x) log 


Pi{x) 

P 2 {x) 


< R 


' € P{X) 


E 

xex 


Q{x) log 


P 2 {x) 

P 2 {x) 


< R 


' e P{X) 


E 

xex 


Q{x) log 


P2{x) 

Pl{x) 


< R 



(4.5.37) 

(4.5.38) 

(4.5.39) 

(4.5.40) 



in the same way as Example 4.4.5. Denote by and the projections 
of Pi and P 2 on 



i/1 n (j^pi n K^^^) u {p2 n k^^)^ , 
n ^(/X2 n it^^) u (pi n k^')^ , 



respectively. Then, if we apply Sanov’s theorem in the same way as Exam- 
ple 4.4.5, we obtain 

r,{R) = min(D(P4'>||Pi),D(Pf IIP 2 )), (4.5.41) 

which is the same as rj{R) obtained in Example 4.4.5. This formula indicates 
that t]{R) = 0 if 

R>mm{D{Pi\\Pi),D{Pi\\P2)^D{P2^^^^ 

is satisfied and rj{R) is a continuous and monotone decreasing function of R. 
By substituting this r]{R) into the right-hand side of (4.5.3) in Theorem 4.5.1, 
we can compute values of B*(r|X||X) as a function of r. □ 



Remark 4.5.3. Unfortunately, there is no simple form of R*(r|X||X) cor- 
responding to Re(r|X||X) for the mixed sources X and X given in (4.4.60) 
in Remark 4.4.3. □ 



Example 4.5.7. So far, we have only considered the cases where is a finite 
alphabet. If we consider general stationary memoryless sources with an al- 
phabet X not restricted to a finite set, Sanov’s theorem does not always hold. 
However, we can compute B*(r|X||X) by using Cramer’s theorem^ which al- 
ways holds (suppose here that X and X are stationary memoryless sources). 



4.6 Generalized Hypothesis Testing 



319 



That is, as is mentioned in Example 4.4.7, we have only to use the rate 
function I{x) and set 

t]{R) = inf /(x) (4.5.42) 

x<R 

similarly to (4.4.83). Note that the right-hand side of (4.5.42) is expressed in 
terms of divergences (similarly to Sanov’s theorem) only if G P is satisfied, 
where we use the notation given in Example 4.4.7. □ 



Example 4.5.8. Let us consider the stationary memoryless Gaussian sources 
X = {P/t} and X = {P-r} treated in Example 4.4.9 in the preceding section. 
Since r]{R) and R^-r]{R) are given in (4.4.96) and (4.4.97), respectively, sub- 
stitution of these into (4.5.3) in Theorem 4.5.1 and some simple calculation 
yield 

B:(r|X||X) = (v^ - v^)^l[r > a] (4.5.43) 

(Fig 4.8), where a = D{P^\\P-j^). Notice here that this function and Pe(^|X||X) 
in Example 4.4.9 are symmetric with respect to the vertical axis. The formula 
(4.5.43) tells us that P*(r|X||X) is a monotone increasing function of r. It 
also tells us that P*(r|X||X) = 0 for r < a. □ 



5*(r|X||X) 




Fig. 4.8. 



4.6 Generalized Hypothesis Testing 

In all the hypothesis testing problems treated in this chapter, in the 
alternative hypothesis X = regarded as a probability distribu- 

tion (probability measure) over However, all of the theorems, lemmas 
and remarks except Theorem 4.3.2 still hold if P-^^ is replaced with another 



320 4 Hypothesis Testing 



nonnegative measure (not necessarily a probability measure) Gn satisfying 
Gn{9) = 0- Here, the error probability of the second kind An = Fv{X^ e An} 
is interpreted as An = Gn{An)- In addition, the inequality 2(X||X) > 0 must 
be replaced with the inequality D(X||X) > —7^, where 



K = limsup — logG„(A’"), 

n— >oo U/ 



K = lim sup — log 

n—^oo U; 



1 

Gni^r^)' 



We also note that Theorem 4.5.1 holds only if k < Too is satisfied. We must 
replace ^*(0|X||X) — 0 in Theorem 4.5.1 with .B*(0|X||X) = k and 1 — An 
in Definition 4.5.1 by Kn — \n- 

As an example of such nonnegative measures Gn {n = 1, 2, • • •) we may 
consider the measure (called the counting measure) satisfying Gn(x) = 1 
(Vx G Vn = 1,2, • • •) if X is a finite or a countably infinite alphabet. 
Another example may be the n-dimensional Lebesgue measure if X is the set 
of all real numbers (Theorem 4.3.2 holds if P-^r, is replaced with the counting 
measure or the Lebesgue measure). In particular, the hypothesis testing with 
the counting measure as Gn is nothing but the fixed-length source coding 
described in Chapter 1 as will be shown in the following section. 



Remark 4.6.1. If the probability distribution of the null hypothesis is 
replaced with another nonnegative measure Fn such that Pn(0) = 0, we can 
easily verify that Theorem 4.4.1 and Theorem 4.5.1 still hold. We have only 
to interpret probabilities in the proofs as the corresponding measures. □ 



4.7 Hypothesis Testing and Source Coding 

So far we have described theorems on the hypothesis testing. In this section, 
we point out that the hypothesis testing problems with a countably infinite 
alphabet X are deeply related to the fixed-length source coding problems de- 
scribed in Chapter 1. 

For example, we can see that Theorem 1.9.1 on the source coding is ob- 
tained as a special case of (the generalized version of) Theorem 4.4.1. To this 
end, let the null hypothesis X == be arbitrary and let the alternative 

hypothesis X == {Gn}^=\ be the counting measure 

Cn(x) = 1 (Vx G X^) 

described in the preceding section. We denote by C = {Cn}^=i Ibis alter- 
native hypothesis. For an arbitrary given acceptance region An C X^ ^ set 
Mn = \An\ and define the mapping ipn '■ X'^ Mn that maps each element 
of An to a distinct element of Mn = {1, 2, • • • , Mn} in the order of 1, 2, • • • , 



4.7 Hypothesis Testing and Source Coding 



321 



and all elements of to 1. Define : Mn — ^ as the inverse map- 

ping of ^n\Ar,' If consider the source coding with as an encoder and 
'0^ as a decoder, we have An = {x G | '0^((/?^(x)) = x}, which leads to 
the fact that the error probability of the first kind = PrjX’^ ^ An} of 
this hypothesis testing is equal to the decoding error probability Sn caused 
by the code Such a relationship between the hypothesis testing 

and the source coding is one-to-one if we identify codes sharing the set 
An = {x G I 'ipni^ni'^)) = x}, the Set of all correctly decodable x G Af^, 
as the same code. In this case, the (generalized) error probability of the sec- 
ond kind An can be written as 

An — ^n(*^n) ~ |*^n| ~ 

= (4.7.1) 

under the counting measure Cn, where 

Tn = - log Mn 
n 

means the coding rate of the code {(pn^i^n)’ Then, we obtain from (4.7.1) 
that 

lim inf — log — lim sup Tn • 

^ Tl An n — >^oo 

Hence, R being an r-achievable rate of the (generalized) hypothesis testing 
is equivalent to — being an r-achievable rate of the source coding. From 
Definition 1.9.1, Definition 1.9.2, Definition 4.4.1 and Definition 4.4.2, we 
can obtain 



Be{r\X\\C) = -Re{r\X) (Vr > 0) 



(4.7.2) 



connecting .Be(r|X||C) with Re{r\X). 

By using (4.7.2), we can obtain Theorem 1.9.1 from Theorem 4.4.1 and 
vice versa. For example. Theorem 4.4.1 implies Theorem 1.9.1 in the following 
manner. First, by recalling that the alternative hypothesis is the counting 
measure the probability on the right-hand side of (4.4.1) defining r]{R) 
can be written as 



Prjllog <r\ 

= Pr(l 
[n 

= Pr(l 

In 



C„(X«) - ^ 
logPx" (X") < R 



= Pr 



1 , 1 



> -R 



Taking the definition (t{R) in (1.9.2) into consideration, we have rj{R) = 
a{—R), which leads to 



322 4 Hypothesis Testing 



a{R) - r]{-R). (4.7.3) 

Then, Theorem 4.4.1 on the (generalized) hypothesis testing and (4.7.2) yield 

i?e(r|X) = -Be(r|X||C) 

= - inf {R + ri{R) \ r]{R) < r] 

Ft 

= sup {-R - t]{R) I t]{R) < r} . 

R 

By replacing by — and using (4.7.3), it follows that 
Re{r\X.) = sup {R - a{R) \ cr{R) < r} , 

R>0 

which is exactly the same as Theorem 1.9.1 on the source coding. 

By using an argument similar to the argument above, it is easy to verify 
that. Theorem 4.1.1, Theorem 4.2.1 and Theorem 4.3.1 with the counting 
measure C = {Cn}^=i as the alternative hypothesis X coincide with Theo- 
rem 1.3.1, Theorem 1.6.1 and Theorem 1.5.1 on the fixed-length source coding 
that are described in Chapter 1. In addition, in this case (4.1.2) coincides with 
Lemma 1.4.1 in Chapter 1. 

Readers may feel that there may be a relationship between Theorem 4.5.1 
on the hypothesis testing and Theorem 1.10.1 on the source coding similar 
to the relationship between Theorem 4.4.1 and 1.9.1. Nevertheless, there is 
no such a relationship. This is because the two definitions of the r- achievable 
rates are different. Recall that, while the r-achievable rate R in Defini- 
tion 1.10.1 is defined under the constraint 

1 1 1 1 

lim sup — log < r, 

n—^oo Ft i Syi 

the r-achievable rate R in Definition 4.5.1 is defined under the constraint 
lim inf — log — > r, 

n— >-oo n (In 

though we have = Sn in the relationship between the hypothesis testing 
and the source coding discussed so far. 

However, we can obtain a relationship between them by modifying the 
formulation of the source coding problem treated in §1.10. To this end, assume 
that a source alphabet X is finite and consider the “dual coding rate” 

p„ = llog(|;f|”-M„) (4.7.4) 

instead of the coding rate — — logM^. Furthermore, let us define, instead 

n 

of Definition 1.10.1 and Definition 1.10.2, respectively: 



4.7 Hypothesis Testing and Source Coding 



323 



Definition 4.7.1. 

Rate R is r-achievable There exists an (n, 6n)-cc>de 

satisfying lim inf — log — > r and 

n-^oo n £n 

lim inf — log pn > R. 

n-^oo 77, 

Definition 4.7.2 (supremum r-achievable fixed- length dual coding 
rate). 

Re{r\X.) = sup {R I R is r-achievable} , 



Such definitions make sense in the following case. Suppose that the decod- 
ing error probability is required to satisfy £n — for a large enough 

r > 0. Since Sn becomes quite small in such a situation, the coding rate 

rn — — log Mn is nearly equal to log \X\. This implies that — \X\^. There- 
n 

fore, it is meaningful to evaluate — Mn instead of Mn itself. In this case 
it is a fundamental problem on the source coding to make the dual coding 
rate pn in (4.7.4) satisfying 

-Mn = 

as large as possible. Note that Re(r|X) in Definition 4.7.2 means the supre- 
mum of the dual coding rate pn with respect to all codes satisfying the con- 
dition 

lim inf — log — > r. 

n-^oo n Sn 

In this modified source coding problem, since C = {Cn}^=i is defined as 
the counting measure, we obtain 



lA'r - A, = 



instead of (4.7.1) and 
lim sup - log I . 

n—^oo ^ \X\ 



= — lim inf pn 



Therefore, it turns out that R being the r-achievable rate of the (generalized) 
hypothesis testing is equivalent to —R being the r-achievable source coding 
rate. From Definitions 4.5.1, 4.5.2, 4.7.1 and 4.7.2, we have the following 
relationship: 



B:(r|X||C) = -i?e(r|X). _ (4.7.5) 

By making use of this equality, if either B*(r|X||C) or Re{r\X.) can be com- 
puted, we can compute the other. However, as is easily verified, the assump- 
tion (4.5.2) in Theorem 4.5.1 does not hold for the case that X is equal to 
the counting measure C. This means that the formula (4.5.3) for B*(r|X||C) 
no longer holds. Thus, it is temporally impossible to obtain a formula for 
Re(r|X) on the source coding via the formula on the hypothesis testing, 
though the formula for Re(r|X) can be obtained in such a way. 



5 Rate-Distortion Theory 



5.1 Coding Subject to Distortion Criterion 

In the source coding treated in Chapter 1 we consider the problems in which 
we minimize the coding rate subject to the constraints such as 

lim Sn = 0, 

n— >oo 

lim sup < € (0 < 6: < 1) 

n-^oo 

or 

lim inf — log — > r (r > 0), 

n— >-oo n Sfi 

where €n denotes the decoding error probability. However, the source coding 
problems subject to such constraints on the decoding error probability only 
make sense when the size of a source alphabet is finite or countably infinite. 
On the other hand, when we consider continuous sources that output contin- 
uous values, it is impossible, or actually nonsense, for an encoder with a finite 
rate to encode original data x into reproduced data y in such a way that y 
coincides with x. In order to make the encoding meaningful in such a case, 
we need to permit finite distortion between x and y within some acceptable 
level. This leads to the following formulation of the encoding problem: we 
minimize the coding rate subject to a given criterion on distortion. In this 
chapter we deal with such rate- distortion problems. 



5.2 Rate-Distortion Theory for Stationary Memoryless 
Sources 

At the beginning of this chapter, we consider the rate-distortion theory for a 
stationary memoryless source 

X = {X" = (Xi, X2, . • • , (5.2.1) 

that is the most fundamental source. In this section we assume that X^ 
takes values in a finite source alphabet X. We introduce another set y called 



326 5 Rate-Distortion Theory 



the reproduction alphabet The size of y is assumed to be finite through- 
out this section. We define a function d : x y ^ R"*" = [0, -hoo) called 

the distortion measure and call d{x^y) the distortion between x ^ X and 
y ^ y. The distortion dn(x,y) between x = (xi, X2, • • • ? ^n) ^ X'^ and 
y = (2/I7 y2, • • • , 2/n) ^ 3^"" of length n is defined by 

n 

dn{x, y) = (5.2.2) 

2=1 

Then, the distortion per source symbol is given by 

-d„(x,y) > 0. 

n 

We encode a source X under the distortion measure dn in the following 
way. We first set a code 

Mn = {l. 2 r-,Mn} 

in the same way as the fixed-length coding and define an encoding function 
{encoder) (fn • X'^ — ^ Mn and a decoding function (decoder) 'ijjn : Mn — ^ 3^^- 

Here, we call rn = — log Mn the coding rate and v^n(x) the codeword for x. A 
n 

source output x G X'^ is first encoded into a codeword by the encoder (pn and 
then decoded into a y G 3 ^^ by the decoder ijjn- We would like to make both 

the coding rate and the distortion between x and y per symbol — d^(x, y) 
simultaneously as small as possible. We define the average distortion per 
source symbol of the code (99^, by 

= -EdniX^, M^n{X^))). (5.2.3) 

n 

This corresponds to the decoding error probability in the case of the source 
coding. 

We call the pair of an encoder and a decoder ((/?^, 'ijjn) with the code of size 
Mn and the average distortion pn an (n, Mn, Pn)-code. The rate-distortion 
problem is formulated in the following way: 

Definition 5.2.1. 

Rate R is D-achievable There exists an (n, Mn, Pn)-code satisfying 

lim sup Pn ^ D and lim sup — log Mn < R. 

n-^00 n^oo 

Definition 5.2.2 (rate-distortion function). 

R{D\X.) = inf {R\ R is JD-achievable} . 



5.2 Rate-Distortion Theory for Stationary Memoryless Sources 327 



i?(T)|X) represent the infimum of the coding rate on the encoding with 
the average distortion asymptotically less than or equal to D. The objective 
of this section is determination of R{D\1C) as a function of D. Denoting the 
mutual information between X and y by /(X; y), we have the following the- 
orem: 



Theorem 5.2.1 (Shannon [79]). The rate- distortion function for the sta- 
tionary memoryless source X subject to a probability distribution Px is given 
by 

R(D\X)= min I(X;Y), (5.2.4) 

Y:Ed{X,Y)<D 

where the minimum on the right-hand side is taken with respect to all random 
variables Y E y (V can be correlated with X ) satisfying Ed{X,Y) < D. 



Remark 5.2.1. The right-hand side of (5.2.4) is convex as a function of D. 
This can be verified in the following way. Let Yi and Y 2 be arbitrary random 
variables satisfying Ed{X,Yi) < Di and Ed{X,Y 2 ) < D 25 respectively, and 
let oi > 0 and 02 > 0 be arbitrary constants satisfying ai -f 02 = 1. We 
introduce another random variable Q, which is independent of X, satisfying 
Pq{1) = ai and Pq{2) = 02 - By using Q, we define a random variable Y by 
y = Yi if Q = i foi i = 1,2. Then, it follows that 



Ed(X, y) = aiEd{X, Yi) + a2Ed{X, Y 2 ) < aiDi + a 2 l> 2 . 



(5.2.5) 



On the other hand, by noticing that Q is independent of X and therefore 
I{X;Q) = 0, we have 



Oi/(X; Yi) + 02 /(X; Y 2 ) = /(X; Y\Q) 

= I{X-QY) 

>I{X-,Y). 

The combination of (5.2.5) and (5.2.6) implies that the function 



(5.2.6) 



f(D)= min I(X-Y) 

^ ^ Y:Ed(X,Y)<D 



is convex as a function of D. Note that the convexity of f{D) guarantees the 
continuity of /(D) with respect to D. □ 



Proof of Theorem 5.2.1. 

1 ) Direct part: 

The proof of the direct part is based on the proof given by Gallager 
[30] (or Cover and Thomas [17]). We first prove that, given an arbitrary Y 
satisfying Ed(X, Y) < D — 7 , R = I (X ;Y) is D-achievable, where 7 > 0 

is an arbitrarily small constant. Let X'^Y'^ = (XiTi, X 2 y 2 , • • • , X^yn) be 



328 5 Rate-Distortion Theory 



stationary and independent copies of XV and set = (Xi,X 2 , • • • ,Xn) 
and = (Fi, I2, • * * ? ^n)- Define 



j'(i) — 
-*-n 



|(x,y) € A”” X 3;" 




-Py"|X"(y|x) 

Py„(y) 



I(X;Y) 



< 7| , 



T^2) = |(x,y) e A'" X r 



-d„(x,y)-Ed(X,Y) 



n 



< 7 



}■ 



(5.2.7) 

(5.2.8) 



and Tn = fl Tn \ Notice here that Ed(X, y) < D — 7 and (5.2.8) imply 



that 

id„(x,y)<Pif(x,y)eT„. (5.2.9) 

n 

Since X'^Y'^ = {XiYi, X 2 Y 2 , • • ■ , XnYn) is stationary and independent, it 
holds that 



1, Py.lxAY-m 1^, PY\x{Yi\Xi) 

Py.(y«) 

-rf„(x«,F") = -Vd(x,,yO- 

n n 



(5.2.10) 

(5.2.11) 



We note that each term in the sums of the right-hand sides in (5.2.10) and 
(5.2.11) is independent and satisfies 



E 



log 



PY\x{Yi\XiY 

Py{Y) _ 



Ed{Xi,Yi) = Ed{X,Y) 



I{X;Y) (z = l,2,...), 
(r = l,2 ,-.-). 



Since their variances are finite (see Remark 3.1.1), recalling the definitions of 
and in (5.2.7) and (5.2.8), respectively, and applying Chebyshev’s 
inequality yield 

Pr {X”F” € Py) } -^ 1 as n -> oo, 

Pr {X”F” e ^ 1 as n ^ oo. 

Therefore, we have 

Pr {X^Y^ eTn}-^l as n ^ oo. (5.2.12) 



Here, we generate vi, V2, • • • , ^ independently subject to the proba- 

bility distribution Pyn and define C = {vi, V2 , • • • , vm„}, where 

Mn = = e^a(^;>^)+27) 



(random coding). Next, for each x G we define an encoder (pn by (pn(x) = 
io, where io is determined from 



5.2 Rate-Distortion Theory for Stationary Memoryless Sources 



329 



c*n(x,VjJ= min d„(x,Vj). 

l<i<Mn 



(5.2.13) 



We define a decoder by '0n(O == if the decoder receives i from the 
encoder. If we define 

= Pr|ld„(X",V-n(^n(X”))) > r>| (5.2.14) 

for such a randomly defined pair {(pm 'ipn)^ we obtain 

= -EcEdr^iX^, M^niX^)) <D + (5.2.15) 

n 

where dmax = niax d(x, y) and E and Eq denote the expectation with 
xex.yey 

respect to and the random code C, respectively. 

Hereafter, we evaluate . Since vi, V 2 , • • • , vm„ are independent and 
subject to the identical distribution the definition of {(Pm'^n) and 

(5.2.14) lead to 



r 1 1 

^x-(x) RPr I -d„(x, Vi) > D I 
X6.V- i=i J 



^ Pxn(x)(Pri-c!„(x,r”)>P 






Mr, 



Mr, 



= ■Px"(x)|l- -P>^"(y)l[^«*n(x,y) < P] j . (5.2.16) 



xGA'^ 



yGT- 



By setting 



J(x y) = D^“ (x,y)eTn 
^ [0 otherwise, 



(5.2.17) 



we have 

J(x,y) < 1 



-d„(x,y) < D 



n 



from (5.2.9). Hence, P^ in (5.2.16) is upper-bounded as follows: 

Mr, 

y€T- 



(5.2.18) 



We notice here that, since (5.2.7) guarantees that 
PyAy) > e-"(^(^'^)+^)Py„|x^(y|x) 
for (x, y) G Tn, it holds that 



330 5 Rate-Distortion Theory 



for any (x, y) e xy'^. By substituting this into the right-hand side of 
(5.2.18), we obtain 

■pin) 

e 

< Y. Pxn(x) ^ Pyn|X"(y|x)J(x,y)^ . 

xga:’- \ yey- / 

(5.2.19) 

If we use the inequality 

(1 — xY < (0 < X < 1, y > 0) (5.2.20) 

and Mn = it follows that 

p 1 ”^< Y 

xG A'^'- 

Furthermore, by using the inequality 

e~^y <l — y e~^ (x > 0, 0 < y < 1), (5.2.21) 

we obtain 

p1”^ <'i-~ Y -Px"(x) Y -Py"|X"(y|x)J(x,y) + 6 “®’*"' 

x€A’" yey‘ 

= 1 - Pr {X"y" eTn} + e-"”"' 

= Pr {X”y” ^ T„} + e-""^ . (5.2.22) 

Since 0 as n — > oo and the first term on the right-hand side satisfies 

PrjX^y’^ ^ Tn} — > 0 as n — > oo from (5.2.12), we obtain 0 as 

n ^ oo. This, together with (5.2.15), guarantees the existence of at least one 
deterministic pair of an encoder and a decoder satisfying 

Pr, = V'n(^n(^”))) < P + Pl"^<ax. 

Hence, we have 
limsup Pn ^ D. 

n— »oo 

By recalling that Mn = e'^i^i^'X)-^‘^i) ^ i^^ clearly holds that 
limsup — logM-^ < I{X\Y) + 27 , 

n^oo 

which means that P = /(X; F) -f 27 is P-achievable. Since Y is arbitrary as 
far as it satisfies Ed(X, F) < P — 7 , it follows that 



5.2 Rate-Distortion Theory for Stationary Memoryless Sources 



331 



R(D\X) < min I(X; Y) + 27. 

Notice here that the right-hand side of this inequality is continuous with 
respect to 7 due to Remark 5.2.1. Since 7 > 0 is arbitrary, we obtain 

i^miX)< min I(X]Y) 

Y:Ed{X,Y)<D 
by letting 7 — > 0. 

2 ) Converse part: 

Suppose that R is /^-achievable. Then, there exists a pair ((Pn,i^n) of an 
encoder and a decoder satisfying 

limsup lEd„(X", < D, (5.2.23) 

n— »oo ^ 

limsup — logMn < R. (5.2.24) 

n^oo VI 

Setting 

y” = (y/"\ y 2 <"\ • • • , yj”)) = 

we note that Y'^ can take at most values because the size of the range of 
the encoding function (pn is at most This leads to 

log Mn>H{Y^) 

> 7(JC";y") 

= 7f(X") -i7(X”|y") 

i=l i=l 

where X^~^ — (Xi, X2, • • • , Xi_i) and the last equality follows from the 
assumption that the source is memoryless and the chain rule of the entropy. 
By noticing 

H{Xi\Y^X^-^) < H{Xi\Yp^), 

we have 

logM„ > -f^H(Xi\Yp^) 

i=l i=l 

= f^I{X,-Yp\ 

i=l 



which means 



332 5 Rate-Distortion Theory 



Now we introduce the random variable Qn G {1, 2, • • • , n} satisfying Pq^ (i) = 
1/n for i = l,2 ,---,n and define a random variable by = 

if Qn = i- Then, (5.2.25) can be written as 

- log Mn > I{X- y ig„). (5.2.26) 

n 

By noticing that Qn is independent of X from the stationary of the source, 
it holds that 

7(X;y(”)|Q„) = 7(X;Q„y(”)) > 7(X;y(”)). 

Thus, (5.2.26) becomes 

ilogM„>7(X;y(”)). (5.2.27) 

n 

On the other hand, since we have 

-Edn{X^,M‘fn{X'^))) = -E(Z„(X”,y”) 

n n 

^ i=l 

= Ed(X,y(">), 



(5.2.23) becomes 

D > limsupEd(X,y<”)). 

n— >oo 

Therefore, for an arbitrary small constant 7 > 0 we have 
D + 7 > Ed{X, y(">) (Vn > no). 



Combining this inequality with (5.2.27) yields 



— logMn> min /(X;F), 
n ^ y:Ed(x,y)<D+7 

which, together with (5.2.24), shows 

R> min I(X;Y). 

Y:Ed(X,Y)<D-\--Y 



(5.2.28) 



Notice here that from Remark 5.2.1 the right-hand side of this inequality is 
continuous with respect to 7. Since 7 > 0 is arbitrary, we can conclude that 

R> min I(X;Y) 

Y:Ed{X,Y)<D 

by letting 7-^0. This establishes 



R(D\X.)> min /(X;F). 
y:Ed(x,y)<D 



5.3 General Rate-Distortion Theory 



333 



5.3 General Rate-Distortion Theory 

In this section we extend the rate-distortion theory from the viewpoint of 
the information-spectrum so that we can treat arbitrary general sources X = 

= {X[^\ X 2 ^\ • • • , while the rate-distortion theory treated 

in the preceding section only deals with stationary memoryless sources. Unless 
stated otherwise, a source alphabet X and a reproduction alphabet y are 
assumed to be arbitrary sets (not restricted to finite sets) throughout this 
chapter. 

First, let us define a general distortion measure. We call an arbitrary given 
function 

dn : X 3^^ -> R+ = [0, -hoo) (n - 1, 2, . • •) 

the distortion measure. We also call dn(x, y) the distortion between x G 
and y E y^. Then, the distortion per source symbol is given by 

-<^n(x,y)>0. (5.3.1) 

n 

In the preceding section we have treated a distortion measure dn satisfy- 
ing (5.2.2). Such a distortion measure is called to be additive. Hereafter, we 
consider distortion measures not restricted to additive distortion measures. 

There are four kinds of rate-distortion problems for general sources under 
the general distortion measure. That is, we have two kinds of rate-distortion 
problems whether we consider the fixed-length coding or the variable-length 
coding. Furthermore, for each coding we consider two kinds of criteria on dis- 
tortion. One is the average distortion criterion and the other is the maximum 
distortion criterion. We formulate the four kinds of rate-distortion problems 
in sequence. 

a) Fixed-length coding under the maximum distortion criterion 

The encoder 99 ^? the decoder '0^ and the coding rate are defined in the 
same way as are defined in the preceding section (fixed- length coding). Here 
we consider not the average distortion pn but the maximum distortion defined 
by 

p- lim sup t (X” , V’n (^")) ) • 

n — >-oo n 

This quantity is called the maximum distortion of the code {(fn, 'ipn)- Denoting 
by ll/ll the size of the range of a mapping /, the fixed-length encoding prob- 
lem under the maximum distortion criterion is formulated in the following 
way: 



334 5 Rate-Distortion Theory 



Definition 5.3.1. 

def 

(R,D) is /m-achievable There exists a code (v^n,V’n) satisfying 

p-limsup-d„(X",V’n(<^n(-’^”))) < D and 

n— >oo 

limsup-log||y)„|l < R. 

n—*cx:i ^ 

Definition 5.3.2. Define 

Rfm{D\X.) = inf {R \ {R, D) is /m-achievable} , 

Dfm{R\^) = inf {D I (R,D) is /m-achievable} . 

We call Rfm{D\X.) and D/^(i?|X) the /m-rate-distortion function and fm- 
distortion-rate function, respectively. 

b) Fixed-length coding under the average distortion criterion 

The encoder the decoder 'ipn and the coding rate are defined in the 
same as before (fixed- length coding). Here, however, we consider 

Pr, = -Edn{X^,M^n{Xn)) 

n 

as a criterion on distortion. We call pn the average distortion of the code 
The rate-distortion problem treated in the preceding section is in- 
cluded in this problem as a special case. The fixed- length coding problem 
under the average distortion criterion is formulated in the following way: 

Definition 5.3.3. 

{R,D) is /a-achievable There exists a code satisfying 

limsup — '0n(v^n(-^’^))) < D and 

n—^oo ^ 

limsup - log ||{^„|| < R. 

n— >oo 

Definition 5.3.4. Define 

Rfa{D\X.) = inf {R | (R, D) is /a-achievable} , 

Dfa{R\^) = inf {D I {R, D) is /a-achievable} . 

We call RfaiDlX.) and D/a(i^|X) the /a -rate-distortion function and the 
/a -distortion-rate function, respectively. 

c) Variable-length coding under the maximum distortion criterion 

Here we consider the encoder defined by an encoding function 



5.3 General Rate-Distortion Theory 



335 



where U — — 1} denotes the code alphabet and is sup- 

posed to be a prefix code (see §1.2 in Chapter 1). The decoder is determined 
by a decoding function 

It is important for this kind of variable-length coding to make the average 
codeword length per source symbol 

r„ = 1 e|<p„(X")| 

n 

as small as possible. We call this Vn the coding rate of the variable-length 
coding. 

The following definitions formulate the variable-length coding problem 
under the maximum distortion criterion given in a). 

Definition 5.3.5. 

(R^D) is um-achievable There exists a prefix code satisfying 

p-limsup V’n(i^n(^"))) < D and 

n—^oo ^ 

limsup — < R- 

n-^oo 

Definition 5.3.6. Define 

Rvm{D\X.) = inf {R I (i^, D) is um- achievable} , 

Dvm{R\^) = inf {D I {R, D) is um- achievable} . 

We call Rym{D\X.) and Dym{R\^) the um -rate-distortion function and vm- 
distortion-rate function^ respectively. 

d) Variable-length coding under the average distortion criterion 

Here we consider the variable-length code 'ipn) defined in c) under the 
average distortion criterion described in b). The coding problem of this case 
is formulated in the following way: 

Definition 5.3.7. 

(RjD) is ua-achievable *4^ There exists a prefix code satisfying 

limsup -Edn(X^,'0n(^n(-^^))) < D and 

n-4oo “n 

limsup iE|(/?n(-^^)| < R- 

n— »oo 



Definition 5.3.8. Define 

Rya{D\X.) = inf {R \ (R, D) is ua- achievable} , 

Dya{R\^) = inf {D I (i?, D) is ua-achievable} . 

We call Rya{D\X.) and DyaiRlX.) the ua -rate-distortion function and the 
ua -distortion-rate function, respectively. 



336 5 Rate-Distortion Theory 



We have completed to define four kinds of rate-distortion problems and 
the rate-distortion (distortion-rate) functions. Hereafter, we develop prop- 
erties of these functions, while focusing on the rate-distortion functions be- 
cause the distortion-rate functions are the inverse of the corresponding rate- 
distortion functions. 

At first, we introduce the notion of the “uniform integrability” that plays 
a crucial role in analyses of the coding problems under the average distortion 
criterion. In general, if a sequence of real- valued random variables {Zn}^=\ 
satisfies 

lim sup — (5.3.2) 

— z-.\z\>u 

is called to satisfy the uniform integrability^ where Zn takes values 
in any real sets (n = 1, 2, • • •). The following lemma shows one of the most 
fundamental properties on the uniform integrability. 

Lemma 5.3.1 (Billingsley [9]). If {Zn}^=i satisfies the uniform integrabil- 
ity, then it holds that 

Pz„y^{z,v)\z\ ^ 0 as n 00 (5.3.3) 

(z,v)EAn 

for any subset An C Zn ^ Vn {n = 1,2,---) satisfying Pz^v^{An) — > 0 as 
n ^ oo, where Vn is an arbitrary random variable with arbitrary real alpha- 
bet Vn that can be correlated with Zn- □ 



If there exists a reproduction sequence G called the reference word 
such that 

) n=l 

satisfies the uniform integrability, that is, if the condition 



lim sup 



Yj -Px"(x) 

^dn(x,r(^))>u 




= 0 



(5.3.4) 



is satisfied, we say that the distortion measure dn satisfies the uniform 
integrability for the source X = with the reference word 

(n = 1,2, • • •). Then, letting be an arbitrary random variable correlated 
with X^, Lemma 5.3.1 guarantees 

Px-^v^(x, v) f-d^(x, r^^^) j — > 0 as n 00 (5.3.5) 

(x,v)eAri ^ 



for any An with Px^v^^- (An) — > 0 as n — > (X). This property plays an important 
role in the proofs that appear subsequently. 



5.3 General Rate-Distortion Theory 



337 



Example 5.3.1. If both a source alphabet X and a reproduction alphabet 
y are finite sets, then the additive distortion measure dn trivially satisfies 
the uniform integr ability (5.3.4). □ 



Example 5.3.2. If there exists a constant d^ax > 0 satisfying 

0 < y) < (Vn = 1, 2, • • • ; Vx € AT”, Vy € (5.3.6) 

n 

then the distortion measure dn satisfies the uniform integrability even if a 
source alphabet X and a reproduction alphabet y are not finite sets (the 
bounded distortion measure). In this case the reference word can be arbi- 
trary. □ 



Example 5.3.3. Let a source alphabet X and a reproduction alphabet 3^ be 
X = y = H = (— 00 , 4-oo). Suppose that X = (Xi, X 2 , • • •) is the stationary 
memoryless Gaussian process. Then, the additive distortion measure 

n 

dn{^, y) = - yif (5.3.7) 

i=l 

satisfies the uniform integrability with the reference word = (0, 0, • • • , 0) G 

where x == (xi, • • • , x^) and y = {yir ’ ' ■> Vn) (see Lemma 5.9.1 below). □ 



Example 5.3.4. Let a source alphabet X and a reproduction alphabet 3^ 
be X = 3^ = R. Suppose that X = (Xi, X 2 , • • •) is the stationary memoryless 
process subject to the Cauchy distribution 

Px{x) = . 2\ («> 0 ). 

7t(x^ + a^) 

While in this case neither the additive distortion measure (5.3.7) nor 

n 

dn{x, y) = ^ \xi - Vi\ (5.3.8) 

2=1 

satisfies the uniform integrability, 

n 

dn{x,y) = XI Via;* -Vi\ (5.3.9) 

2=1 

satisfies the uniform integrability with the reference word = (0, 0, • • • , 0) G 
(see Lemma 5.9.1 below). □ 



The following lemma holds. 



338 5 Rate-Distortion Theory 

Lemma 5.3.2. satisfies the uniform integrahility, 

lim sup E{Zn) < p- lim sup Zn . 



(5.3.10) 



Proof. Let 7 > 0 be an arbitrarily small constant. We choose a sufficiently 
large u > 0 satisfying 

sup Pz„.{z)\z\ <J. 

^ z:\z\>u 

Setting D = p- lim sup Zn , it holds that 

n— >-oo 

E{Zn) = E{Znl[Z„<D + 'r]} 

+ E ^Znl[D + 7 < Zn ^ u]| 

+ E {Znl[Zn > u]} 

< (£> + 7 ) Pr {Zn < £> + 7 } + uPr {Zn > £) + 7 } + 7 
for all n == 1 , 2, • • •. Since the definition of D implies that 
Pr{Z„ > D + j} —>0 as n 00 , 

Pr {Zn < L) + 7 } ^ 1 as n ^ 00 , 
it follows that 

limsupE(Zn) < D + 27 . 

n— >00 

Since 7 > 0 is arbitrary, we obtain limsupE(Zn) < D by letting 7 -^ 0 . □ 

n— >00 

We obtain the following theorem as a consequence of Lemma 5.3.2. 

Theorem 5.3.1. For any source X = and any distortion measure 

dn satisfying the uniform integrahility (5.3.4), have 



R,a{D\X) < Rfa{D\X) < Rfm{D\X), 
Rya{D\X) < Rym{D\X) < Rfm{D\X). 



(5.3.11) 

(5.3.12) 



Proof. We prove (5.3.11) first. The first inequality is clear because the 
variable-length code includes the fixed-length code as a special case. To de- 
velop the second inequality, suppose that {R,D) is /m- achievable. That is, 
suppose that a fixed- length code {(pn,i^n) satisfying 



limsup-log||(/?n|| < R, 



p-limsup -dn(X"",'0n((/^n(X’^))) <D 

n—^00 



(5.3.13) 

(5.3.14) 



is given. We first set 



5.3 General Rate-Distortion Theory 339 



Since the distortion measure dn is assumed to satisfy the uniform integrabil- 
ity, there exists a reference word G 3^^. We construct another fixed- length 
code in the following way. If 

> d„(x,V’n(</?n(x))), 

we define by ^„(x) = </5„(x). If 

d„(x,r(")) < d„(x,i/’n(V’n(x))), 

we define ^^(x) == -f 1. We define the decoder '0^ by = '0n(^) for 

i — 1, 2, • • • , Mn and for i = Mn + 1. Then, it obviously holds 

that 

c*n(x,^„(^„(x))) < d„(x,r<”)) (Vx e A”"), (5.3.15) 

d„(x, V’„(^„(x))) < d„(x, V'„(<^n(x))) (Vx € A"”). (5.3.16) 

In addition, owing to (5.3.13) we obtain 

limsup-log||^„|| = limsupilog(||(/7„|| + 1) < J?. (5.3.17) 

n—^oo n— >cx) 

We note here that (5.3.15) guarantees that 

ld„(x",v;„(^„(x”))) 

also satisfies the uniform integrability because dn satisfies the uniform inte- 
grability. Accordingly, due to Lemma 5.3.2, (5.3.14) and (5.3.16), we obtain 

limsuplEd„(X”,^„(^„(X"))) < p-limsupld„(X",^„(^„(X"))) 

n— tcx) n-^oo 

< p-limSUp-dn{X'^,1pn{fn{X"'))) 

n-^oo 

< D. (5.3.18) 

The combination of (5.3.17) and (5.3.18) means that (i7, D) is /a- achievable. 
Now we can conclude that Rfa{D\X.) < i^/^(jD|X). 

Next, we need to prove (5.3.12). Since the second inequality is clear be- 
cause the variable-length code includes the fixed-length code as a special case, 
we have only to prove the first inequality. The first inequality can be proved 
similarly to the proof of the second inequality in (5.3.11). □ 

We are interested in the structures of the four kinds of the rate-distortion 
functions 



Rfm{D\X), Rfa{D\X), Rym{D\X), R,a{D\X) 



340 5 Rate-Distortion Theory 



as functions of D. While these functions are defined in the form of combi- 
natorial optimization with respect to all permitted codes {(pn^'ipn)^ we will 
see in the following sections in sequence that they are expressed in terms of 
the information-spectrum. In the general rate-distortion problems defined in 
this section, the argument in §5.2 based on the weak law of large numbers 
on the mutual information and the distortion measure is no longer available. 
We will see that the information-spectrum approach plays a key role instead 
of such an argument. 



5.4 Rate-Distortion Function Rfm{D\^) 



Suppose that a general source X = is given. Consider another 

general source Y = taking values in the reproduction alphabets 

(Y is called the reproduction process. * Define 



D{X,Y) = p-limsup-d„(X",y"), 



1 PynU"(Y"|X”) 
/(X; Y) = p- limsup - log . 



(5.4.1) 

(5.4.2) 



where can be arbitrarily correlated with X'^ . Recall that /(X; Y) is the 
spectral sup-mutual information rate of (X, Y) (see §3.5) Then, we have the 
following theorem: 



Theorem 5.4.1 (Steinberg and Verdu [85]). For any distortion measure 
Rfm{D\X)= _inf 7(X;Y), (5.4.3) 

Y:D(X,Y)<D 

where the infimum on the right-hand side is taken with respect to all general 
sources Y satisfying D(X, Y) < D. 



Proof 

1) Direct part: 

We will prove that there exists a code that satisfies 

p-limsupid„(X”,V'„((^„(X"))) < D, (5.4.4) 

n—^oo 

limsup-log||<^„|| < J(X;Y) (5.4.5) 

n-^oo ^ 

for a given Y = satisfying D(X, Y) < D. To this end, we construct 

a random code in the same way as was given in the proof 1) of 

* Throughout this chapter, depending on the context we indifferently use the term 
“process” or “sequence” to denote the general source (not necessarily satisfying 
the consistency) as defined in Chapter 1. 



5.4 Rate-Distortion Function Rfm{D\X.) 341 



Theorem 5.2.1. We replace, however, the mutual information /(X; F) and the 
expectation of the distortion Ed(X, Y) by /(X; Y) and T)(X, Y), respectively, 
and set = Hv^nll = e^U(X;Y)+27) addition, we use 

= {(x, y) € A”" X 7” I 1 log < 7(X; Y) + 7| , 

(5.4.6) 

Tf) = |(x,y) e A”” X 3^" 1 ld„(x,y) < S(X,Y) +7| (5.4.7) 

instead of (5.2.7) and (5.2.8), respectively, and set DTn^\ Under 

this setting if we define 

=Pr|l4(Y”,t/;„(^„(X"))) >S(X,Y)+7|, (5.4.8) 

Pi turns out to have the upper bound given in (5.2.22) from the argument 
in the proof 1) of Theorem 5.2.1. That is, it holds that 

< Pr {X"Y" ^ T„} + e-""^ . (5.4.9) 

On the other hand, since the definitions of 7(X; Y) and T>(X, Y) imply 

Pr |X”Y” ^ ^ 0, Pr |x”Y" ^ | ^ 0 as n oo, 

we have Pr {X”Y" ^ T„} — > 0 as n ^ oo. Thus, Pg"^ — » 0 as n — > oo follows 
from (5.4.9). This, together with (5.4.8), means 

Pr|ld„(X",V>„(^„(X”))) >:D(X,Y)+7 | ->0 asn-^oo, (5.4.10) 

which guarantees the existence of a deterministic code satisfying 

(5.4.10). This means 

p-limsup ld„(X”, V'„(^„(X”))) < D{X, Y) + 7 

n— >oo U 

< P + 7- 

In addition, since M„ = ||(^n|| = it trivially holds that 

limsupllog||v?„|| <7(X;Y) + 27 . 

n— >oo ^ 

In order to complete the proof, we choose a positive sequence satisfying 
7i > 72 > • • • > 0 and 7^; — ^ 0 as fc oo and repeat the same argument in the 
order of 7 = 71, 7 = 72, • • •• This develops the existence of the fixed- length 
code {(fn.'^n) satisfying (5.4.4) and (5.4.5). Now we can conclude that 

Rfm{D\X)< _inf 7(X;Y) 

Y:D(X,Y)<D 



342 5 Rate-Distortion Theory 



(the diagonal line argument see the proof 1) of Theorem 1.8.2 in Chapter 1). 



2) Converse part 

Suppose that (i?, D) is /m-achievable. That is, suppose that there exists 
a code satisfying 

p-limsupld„(X”,V’n(Vn(X”))) < D, (5.4.11) 

n— >oo ^ 



limsup — log ||(/?n|| < R- (5.4.12) 

n-^-oo ^ 

By setting = '0^(c^^(X’^)) and Y = (5.4.11) immediately yields 

D(X,Y) <D. (5.4.13) 



Next, we prove 

7(X;Y) <R. (5.4.14) 

Notice that cannot take more than ||v^n|| values, where 

\\g^n\\ denotes the size of the range of the encoder ipn- Then, Lemma 2.6.2 in 
Chapter 2 tells us that 



Pr 









(5.4.15) 



where 7 > 0 is an arbitrary constant. On the other hand, since 

1, JV"|X"(y|x) ^ 1 , 1 

n Py- iy) ~ n °®Pyn(y)’ 

it follows from (5.4.15) that 



•{ 1 ] 

In 



Pyn|Xn(y”|X”) 1 



PyAyv 



n 



Pr<! - log ~ jy > f log||v>„|| +7I < e 

llog|lv?„|| < limsupllog||y)„|| +7 (Vn > no), 
n n— »cxD n 

have 



(5.4.16) 



By noticing 



,. 1, Py.|X^(l"”|X”) 1, „ 

p- hm sup - log < hm sup - log | |<y9„ 1 1 + 27 

n— too ^ ) n— too n 

< P + 27, 

where the second inequality follows from (5.4.12). Therefore, we obtain 

7(X;Y) <P + 27. 

This means /(X; Y) < R because 7 > 0 is arbitrary. Combining /(X; Y) < R 
with (5.4.13) completes the proof of 

Rfm{D\X)> _inf 7(X;Y). 

Y:D(X,Y)<£) 



5.5 Rate-Distortion Function Rfa{D\X.) 



343 



5.5 Rate-Distortion Function Rfa{D\X.) 

In this section we consider a general source Y = taking values in 

the reproduction alphabets again. Here we define 

D(X, Y) = limsup-Ed„(X”,Y”). (5.5.1) 

n— >oo 

We have the following theorem that is expressed in terms of /(X; Y) defined 
by (5.4.2). 

Theorem 5.5.1 (Han [35]). If a distortion measure dn satisfies the uniform 
integrability (5.3.4), 

Rfa{D\X)= inf T(X;Y). (5.5.2) 

Remark 5.5.1. Steinberg and Verdii [85] show that a claim corresponding 
to Theorem 5.5.1 holds under the condition (5.3.6) that is much stronger 
than the uniform integrability (5.3.4). □ 



Proof of Theorem 5.5.1. 

1 ) Direct part: 

We prove this part by using the random code given in the proof 1) of 
Theorem 5.4.1. It is sufficient to prove that, given Y = satisfying 

^(X,Y) < £), there exists a code satisfying 

limsuplE(/„(X",V’n(</^n(^"'))) <D, (5.5.3) 

n—^oo 

limsup 1||<^„|1 < 7(X; Y) + 27 , (5.5.4) 

n—^oo ^ 



where 7 > 0 is arbitrarily small constant. Letting u be an arbitrary nonneg- 
ative real number, define 



(x,y)eY"xy 
5„(u)=T„n|(x,y)GY”xy 



1 , y x) - 

-log — ' . . — < 7(X;Y) + 7 

n fV"(y) 



ld„(x,y) < u ^ . 
n ' 



(5.5.5) 

(5.5.6) 



Next, set 

= llVPnII = e”(7(X;Y)+27) 

and denote by r^’^^ G the reference word. The existence of the reference 
word is guaranteed by the assumption that the distortion measure dn satisfies 
the uniform integrability. We independently generate Vi, V 2 , • • ■ , G 



344 5 Rate-Distortion Theory 



subject to the identical probability distribution Pyrx and set C == {vi, V 2 , • 
{random coding). For the case of 



d(x, r^"^)> min ci„(x,Vj), 

we define an encoder by (pn{^) = where io is determined from 
d„{x,Vi„) = min d„(x,Vi). 

l<t<Mn 

For the case of 

d(x,r<")) < dn{x,Vi), 



(5.5.7) 



(5.5.8) 



(5.5.9) 



we define the encoder ipn by (pni'x.) — Mn + 1. A decoder ifjn is defined by 
'0n(O — for z = 1, 2, • • • , Mn and '0n(O — ^ + 1- In order to 

analyze the performance of the random code {(Pn^'^n)^ we define 



Pf{u) = Pr|id„(X”,V-„(^n(^"))) > 



For simplicity, set 
^n(x) = Px"(x)l 



irf„(x,/"^) > 



(5.5.10) 



(5.5.11) 



Then, {u) can be expressed in the following way: 

P^e\'^)= 'Yh ^n(x)P[PrS -t^n(x,Vj) > ul 
xeA-" i=i J 

= Y ^nW 

xCA-" \ )J 

= Y ( 1 - H PV"(y)l[^c?n((x,y) < 



M„ 









(5.5.12) 



where we use the property that vi, V 2 , • • • , vm^ are independently generated 
subject to the identical distribution Pyn . Now, define 



_ / 1 for (x,y) G 5 „(u), 
«v lyj ^ 0 otherwise. 



(5.5.13) 



Since we have 



J«(x,y) < 1 



-dn{x,y) < u 



(n) 

Pg {u) in (5.5.12) has the following upper-bound: 



5.5 Rate-Distortion Function Rfa(D\X.) 345 

\ Mr^ 



1 - IZ ^V"(y)^(x,y) . (5.5.14) 

xGA*- \ y€3^- / 

Notice that, if (x, y) G Sn{u), it holds that 

fV"(y) > e"”(^^^'’^^+'^'Pyn|Xn(y|x) 
because (x, y) G T^. Thus, we have 



Py"(y)J«(x,y) > e "^^(^’'^)+'>')Pyn|X"(y|x)J„(x,y) 

for all (x, y) G x y^. By substituting this inequality into the right-hand 
side of (5.5.14), we obtain 

< A„(x) 1 1 - ^ fV“IX"(x|y).A,{x,y) j 

By noting (5.5.11) and applying the same argument given in the proof 1) of 
Theorem 5.2.1 using the inequalities (5.2.20) and (5.2.21), we have 

p1”'(u) < Pr |x"y" ^ Sn{u) and ld„(X”,r<")) > u| 

+e“® Pr r^"^) > w| , (5.5.15) 

which corresponds to (5.2.22). Therefore, it follows from (5.5.6) and (5.5.15) 
that 



d„(X”,r(")) >n| 
+ Pr ( X"y” ^ Tn and -dJX^ 

I n 

+ Pr/ld„(X”,r("h >n 
[n 

< Pr|ld„(X”,y”) >u| 

+ Pr |x"y" i Tn and ld„(X", r<")) > u 
+ e"""^ Pr|ld„(X”,r(”^) > w| . 

Now we need the following well known lemma. 







p1"’(u) < Pr { -d„(X”,y”) > u and 



(5.5.16) 



346 5 Rate-Distortion Theory 



Lemma 5.5.1. ^ Let Z be any random variable such that Z > c for some 
constant c with probability one. Then, we have 



poo poo 

E(Z) = j Pz{Z > z}dz c = j Pz{Z > z}dz c 

Proof Define the indicator function 
f{z,u) = l[u < z]. 

Then by means of Fubini’s theorem, we have 

poo p p poo 

/ du f{z,u)Pz{dz) = / Pz(dz) / f{z,u)du. 

Jc J[c,oo) J[c^oo) Jc 

The left-hand side of (5.5.19) is transformed as follows: 

pOO p poo 

/ du Pz{dz) = Pz{Z > u)du. 

Jc J (u.oo) Jc 



(5.5.17) 

(5.5.18) 

(5.5.19) 



Similarly, the right-hand side of (5.5.19) is transformed as follows: 



f Pz(dz) f du= f {z - c)Pz{dz) 

J[c,oo) Jc J[c,oo) 



E(Z) 



c. 



Therefore, we have the first equality in (5.5.17). On the other hand, we sim- 
ilarly have the second equality in (5.5.17) by replacing f{z,u) by g{z,u) = 
l[u<z\. □ 



An immediate consequence of the definition (5.5.10) of P\ (u) and 
Lemma 5.5.1 with c = 0, Z = ^dn{X'^ , is 

-EcEd„(X”,V’n(^„(X«)))= rp^^\u)du. (5.5.20) 

n Jo 

where E and Eq denote the expectations with respect to X'^ and the random 
code C, respectively. 

Furthermore, again by means of Lemma 5.5.1, we have 



fpji 

Jo I’T' 



d„(X",y") > ujdu 

n 

(x,y)GA’-x3;- 



= -EdniX^^Y^). 
n 





n 



^ suggested by Shunsuke Ihara. 



(5.5.21) 



(5.5.22) 



5.5 Rate-Distortion Function Rfa{D\X.) 



347 



On the other hand, we have 



jT Pr |x"F" ^ T„ and r^”') > u| du 



d„(x,r(")) > u 



-d„(x,r^"^) > u 
n 



du 



du 



rOO 

= / ^ -Px"Y" (x,y)l 

•'O (x,y)^T„ 

^OO 

= -Px"y~(x,y) / 1 

(x,y)^T„ 

= ^ Px"Y"(x,y) Qd„(x,r("))V 

(x.y)^T„, ^ 

Since PrjX^F^ ^ T^} — > 0 as n — > oo from the definition of T^, it holds 
that 



^ Pxnyr.(x,y) Qd„(x,r(”))'j ^ 0 

(x,y) 0 T„ ^ 



as n — ^ oo 



owing to the assumption of the uniform integrability (5.3.4) of the distortion 
measure. Thus, we obtain 

B„ = ^ Pr|x”r"^T„and^d„(X",r(”))>u|dtt^O. (5.5.23) 

By substituting (5.5.16) into the right-hand side of (5.5.20) and using (5.5.21) 
and (5.5.22), it follows that 

-EcEdn{X'^,M‘fn{X^))) 

n 

< lEd„(X”, r”) + (lEd„(X”, r(”>)) + P„. (5.5.24) 

This means that there exists at least one deterministic code ((^^i, 'ipn) satisfy- 
ing 



-Ed„(X”,V’n(^n(^”))) 

n 

< lEd„(X”,y") + e-®"^ (lEd„(X",rW) ) + 



(5.5.25) 



0 as n — ^ oo and (5.5.23), we obtain 
lim sup — Ed^(J^ 

n— >cxD Ti 

D{X,Y)<D, 



By noting e ^ 

limsuplEd„(X",V'n(v?n(^"))) < limsup lEd„(X",F"') 

n— >oo ^ n— >cxD U- 



where we use 



sup 

n> 1 V ^ 



lEd„(X”,r(">)) < +00 



348 5 Rate-Distortion Theory 



which follows from the uniform integr ability of the distortion measure. In 
addition, since Mn is defined as Mn — it trivially holds that 

1 1 - 
limsup - log ||(/?n|| < limsup - log(Mn + 1) < /(X; Y) + 27. 

n^oo n^oo ^ 

This completes the proof of the existence of a code {(pn^i^n) satisfying (5.5.3) 
and (5.5.4). 

2) Converse part: 

Suppose that {R, D) is /a- achievable. That is, suppose that there exists 
a code {pn.'^n) satisfying 

limsup -Edn{X^,'ipri{Pn{X'^))) < D, (5.5.26) 

n— >cxD Uj 

limsup — log ||(/:)n|| < (5.5.27) 

n-^00 ^ 

Setting Y'^ = 'ipn{Pn{X'^)) and defining Y — (5.5.26) immediately 

leads to 

D{X,Y) < D. (5.5.28) 

On the other hand, by using Lemma 2.6.2 similarly to the proof 2) in Theo- 
rem 5.4.1, it follows from (5.5.27) that 

7(X;Y) <R. (5.5.29) 

The combination of (5.5.28) and (5.5.29) establishes 

Rfa{D\X)> inf 7(X;Y). 

^ Y:D(X,Y)<D 



5.6 Rate-Distortion Function R^rn{D\X.) 



For a general source Y = taking values in the reproduction alpha- 
bets define 



/(X;Y) = limsup -E 

n -^00 u, 



log 



Pyn(yn) 



(5.6.1) 



where the base of the logarithm on the right-hand side is K = \U\ and U 
is a code alphabet. If we define P(X, Y) by (5.4.1), we have the following 
theorem. 



Theorem 5.6.1 (Han [35]). For any distortion measure dn, 
P„„(Z)|X)= _inf /(X;Y). 

Y:D(X,Y)<D 



(5.6.2) 



5.6 Rate-Distortion Function Rvm{D\X.) 



349 



Proof. The proof of the direct part is longer and much more complicated than 
the proofs of the direct part of Theorem 5.4.1 and Theorem 5.5.1. However, 
the proof is interesting because we can effectively use the information-slicing 
technique. We first prove the converse part, which is more easily established 
than the direct part. 

1) Converse part: 

Suppose that {R, D) is um-achievable. That is, suppose that a variable- 
length code satisfying 

limsup -E|¥J„(X”)| < R, (5.6.3) 

n^oo Vi 

p-limsup -d„(X",V'n(v’n(^”))) <D (5.6.4) 

n— ^-oo ^ 

is given. Setting = '0n(^n(^^)) and Y = (5.6.4) immediately 

implies 

:D(X,Y) <D. (5.6.5) 

Now, we define the random variable Un by Un = Since Un takes 

values in a set satisfying the prefix condition, it must hold that 

E\cpn{X^)\=E\Ur.\>H{Un) 

as was proved in the converse part of Theorem 1.2.1 in Chapter 1, where the 
logarithm is to the base K = \U\. Thus, we obtain 

E|<^„(X")| > F(F”) 

because = '0n(^n) implies H{Y^) < H(Urf). Hence, it holds that 

E\ifn{X^)\ > H{Y^) - H{Y^\X^) 

= /(X^;Y^), 

which yields 

limsup —E|(/9^(X^)| > limsup —/(X^; Y’^) =/(X;Y). 

n—^oo VI n 

This, together with (5.6.3), yields 

/(X;Y) <R. (5.6.6) 

We obtain 

R,m{D\X)> _inf /(X;Y) 

Y:D(X,Y)<D 

from the combination of (5.6.5) and (5.6.6). 



350 5 Rate-Distortion Theory 



2) Direct part: 

Let Y = {Y^} be a general source taking values in reproduction alphabets 
We prove that (R,D) is um-achievable for an arbitrarily given 
(R^D) satisfying 

I{X;Y)<R, (5.6.7) 

D{X,Y)<D. (5.6.8) 



To this end, we show the existence of a variable-length code {ipn, 'ipn) satisfy- 
ing (5.6.3) and (5.6.4). Hereafter, we show the existence of such a code. For 
simplicity, set 



i(x;y) = log^ 



^y"|X"(y|x) 

Pv-iy) 



and for an arbitrary small constant 7 > 0 define 



Dn = \ (x, y)eX^xy^\ -dn{K, y) < Z?(X, Y) + 7 



Sn = Dnn |(x,y) 6 Y" X 3 ^" I -7 < ^i(x;y) < 7(X;Y) + 7| . 

Next, set 

7 „=Pr{X"y"^Z)„}, 

Then, we obtain 

7^ — > 0, — > 0 as n — > 00 (5.6.9) 

from the definitions of D(X, Y), /(X; Y) and /(X; Y) and (3.2.3) in Chap- 
ter 3. We now define L = (/(X; Y) -|-7)/7 and partition the set Sn into L -h 1 
subsets Sn^ {I = 0, 1, • • • , L) defined as follows: 

= |(x,y) &Dn\{l- 1)7 < b(x;y) < /7| 

{information- spectrum slicing). Clearly, we have 

1=0 

We define a variable-length code {(pn^'^n) based on the partition of the 
information-spectrum. 



a) Generation of a random code: 
For / = 0, 1, • • ■ , L define 



5.6 Rate-Distortion Function Rt;m(D|X) 



351 



Ri = {l + 1)7, (5.6.10) 

Ml = (5.6.11) 

{K = \U\) and generate Mi codewords ^ indepen- 

dently subject to the probability distribution Pyn (random coding). Set 

Cl = {v/,1, v/,2, • • • , V^,Mz} 

and denote by Ai = A*(Cq, C i, • • • , C/) the collection of all x G satisfying 
(x, v/cj) G for some Vkj (A: = 0, 1, • • ■ , j = 1, 2, • • - ,Mk). Then, Pi, the 
expectation of Pr {X'^ ^ with respect to the random code (Co, Ci, • • • , C/), 
can be written as 

Pi= Pr{Co,Ci,---,C;} ^ Pxn(x) 

= 52 Px^.(x) 52 Pr{Co,Ci,---,C;} 

Co ,Ci • ,Cz 

= 52Px-^(x)H E (5-6-12) 

xGA'- /c=0Cfc:x^Afc(Cfc) 



where Ak{Ck) denotes the collection of all x G satisfying (x,Vkj) ^ 
for some {j = 1, 2, ■ • • , M/c) and the last equality in (5.6.12) is obtained 
because the random codes Co, Ci , • • • , C/ are independently generated. We next 
evaluate the right-hand side of (5.6.12). Setting 



^/(x,y) 



1 for {x,y) e S^\ 
0 otherwise. 



we have 

Pr{(x,vfcj) ^ 5^'=)} = 1 - 52 PY’‘{y)M^,y) 

yey- 

for each j = 1, 2, • • • , Mk- Then, (5.6.12) can be evaluated as 

^ Mfc 

Pi = E n b “ E ■Py'"(y)‘^fe(x,y) 

yex^’- k=0 \ 



< E Px-(x)exp -52-^fc E Py"(y)M^>y) 

yex^ \ k=o yey^ 

where the inequality follows from (5.2.20). We notice here that J/c(x, y) == 1 
means i(x;y)/n < kj, it holds that 

PvAy) > Pyn|x4y|x)i^-"''P 
Thus, we have 



352 5 Rate-Distortion Theory 



PY’^{y)M^,y)> K ^ Pyn|x-(y|x)Jfc(x,y). 

y^yri y^yrt 

Accordingly, by using this and recalling (5.6.10) and (5.6.11), it follows that 

Pi< Cx"(x)exp I -K”'*' ^ Py„|x-(y|x)^ Jfc(x,y) 

xeA-" \ y€y" k=0 

By applying the inequality (5.2.21) to the right-hand side, we obtain 

i 

Pi<l- ^ Px"y"(x,y)^ Jfe(x,y) +exp(-P:"'^), (5.6.13) 

k=0 

which implies 

l 

Qi> ^ Cx’*y"(x,y) X] y) - 

A;=0 

where 

Qi = l-Pi. (5.6.14) 

In particular, for the case of Z == L we have 

L 

Ql > X!Cx"y"(x,y)^ Jfe(x,y) -exp(-K"'^) 

x,y k=0 

- Pr {X^Y^ e Sn} - exp(-A^^). 

Note that Qj^ means the expectation with respect to the random code of the 
probability of the collection of x G such that there exists at least one 
{k,j) (A; = 0, 1 , • • • , L; i = 1, 2 , ■ • • , Mk) ^tisfying (x, Wk,j) € . Notice 

that, by taking (5.6.9) into consideration, be written as 

Q]^ = l-Xn (An ^ 0 asn— >oo). (5.6.15) 

b) Encoding function 

For each I = 0, 1, • • • , L define 




Ml = {l,2r-, Ml}. 

Then, due to (5.6.11) we can define a fixed-length encoding function 
: Ml ^ 

as a one-to-one mapping, where ||^n^|| = M;. Furthermore, we prepare a 
fixed-length encoding function 

that describes the number of Z — 0, 1, • • • , L, where rriL = log^(L + 1). It is 
clear that we can choose a one-to-one mapping as g. 



5.6 Rate-Distortion Function Rvm(D\X.) 



353 



Now, we define an encoding function ^ U* in the following way. 

For each x G A'^, if there exists a (fc, j) (A: = 0, 1, • • • , L; j = 1, 2, • • • , Mk) 
satisfying (x, v/jj) G Sn \ we define 

^ni^) 9ih)Vn°\jo) (eW*), (5.6.16) 

where ko denotes the minimum of such k and jo is arbitrarily chosen from j 
satisfying G Sn^\ If there does not exist such (A:, j), we define 

^„(x)=5(0)^f(l) (eW). (5.6.17) 

c) Decoding function 

When a decoder receives the output u = encoder Jp^ defined 

by (5.6.16) and (5.6.17), the decoder can uniquely determine the pair (k,j) 
(A: = 0, 1, • • • , L; j = 1, 2, • • • , Mk) satisfying u = g{k)Tpn\j) from such u. 
Defining this as (ko^jo), we can define the decoder as the decoding function 
^ satisfying ^^(u) - 

For the variable-length code (jPri^'^n) defined above, it follows from 
(5.6.15) and the definition of that 

Pr |ld„(X”,^„(^„(X"))) < :D(X, Y) + 7| 

= Pr{(Y",^„(v„(X")))eZ)„} 

>Pr{(X”,^„(^„(Y”)))eS„} 

= Ql ~ ^ ~ 

Thus, we obtain 

Pr |ld„(X”,^„(^JY"))) < D + 7| > 1 - A„ (5.6.18) 

from (5.6.8) (note that 0 as n ^ oo). 

d) Evaluation of coding performance: 

First, note that the probability of = vij for some j = 

1, 2, • • • , M/ is given by Qi - Qi_i for / = 1, 2, • • • , L and Qq -f for I = 0. 
Then, the average length of ^^(X^) is evaluated as 

EcE|^JX")| 

L 

— log/<:(T + 1) + ~ + '^'liQo + An) 

i=l 

L 

— log/c(T + 1) + n7 + 1) + ^t(Qo + (5.6.19) 

i=l 

where the last equality follows from (5.6.10) and Eq and E denote the ex- 
pectations with respect to the random code C = • • • ,Cl) and X'^. 



354 5 Rate-Distortion Theory 



respectively. The second term on the right-hand side of (5.6.19) is evaluated 
in the following way: 

L 

1 = 1 

L L 

= nj Y^iQi - Qi-i)l + nj Y^iQi - Qi-i) 

1=1 1=1 

L 

= nj '^{Qi - Qi-i)l + nj{QL - Qo) 

1=1 

L 

= ^7 ^{^i-i - + njiPo - Pl) 

1=1 

L-l 

= '^7 - Pl) + n7(Po - Pl) 

l=Q 

L-l 

= Pl - n^{L + 1)Pl + n^yPo 
1=0 
L-l 

— ^7 (5.6.20) 

1=0 

On the other hand, since 



l 

■Px-y-(x,y)^ Jfe(x,y) 

k=0 

= Pr |(X”, y") € D„ and - 7 < Y’^) < / 7 | , 

it follows that 



i 

1- Y -Px"y"(x,y)^ Jfc(x,y) 

xG.V’^' k=0 



<Pr{(X”,y”)^D„} 

+ Pr|lj(X";y") < - 7 | +Pr|li(X 

<Pr|li(X";y”)>/7|+7n+Mn. 

By recalling that 7 n — ^ 0 and ^ 0 as n — > 

Pi <Pr|lz(X”;y")>Z 7 |+/?„ 



";y") >^ 7 } 

(5.6.21) 

00 from (5.6.9), we obtain 

(5.6.22) 



5.6 Rate-Distortion Function Rvm{D\X.) 



355 



from (5.6.13), where 

/^n = 7n + Mn + exp(-R:^'^). 

Clearly, — > 0 as n ^ oo. Substitution of (5.6.22) into (5.6.20) yields 

L 

1=1 



< H7 I 
1=0 ^ 



(X";y”)>/7Kn7(L/?„ + l) 



(L-1)7 ( 1 

Pr Y'^)>x}dx + nj{LPn + 1) 



I 



<n/_ 

<nj Pr I h(X"; y”) > x^ dx + nj{L(3n + 1) 



;|li(X";y”)| 



nE \ -i(X”; y”) \ + ri 7 (L/ 3 „ + 2) 



= /(X”;y”) + n 7 (i/ 3 „ + 2), 



(5.6.23) 



where in the above we have used Lemma 5.5.1 with c = — 7 and Z = 
Then, substitution of (5.6.23) into (5.6.19) yields 

EcE|^„(X”)| 

< 7(X”; y”) + ri'fiLPn + A„ + 3) + log^(L + 1) 

< 7(X";y") +4n7 



for sufficiently large n. If we apply Markov’s inequality given in Remark 1 . 1.1 
in Chapter 1, we obtain 



Pr|E|^„(X”)|> 



J(X”;y») +4 h 7 
I (\/'^n "b ^n) 







Pr|E|^„(X")| < > 7A; + a„, (5.6.24) 

f 1 - (VAn + Q^n) J 

where {a^} is an arbitrary positive sequence satisfying a^i ^ 0 as n oo. 
After rewriting (5.6.18) as 

EcPr|ld„(X”7„(^„(X”))) <7? + 7 |c| > 1 - A„, (5.6.25) 

we use the following lemma. 

Lemma 5.6.1 (Markov’s reverse inequality). Let Z be a random vari- 
able taking values in Z and satisfying 0 < Z < 1. If Z satisfies E{Z) > 1 — fi 
(0 < /i < 1), it holds that 

Fr{Z >1- y/jl} > 1 - y/Ji. 



356 5 Rate-Distortion Theory 



Proof. Define 

B = {zeZ\z>l - ^/Jl} . 

Then, it follows that 

nz) = E zPz{z) = E zPz{z) + E ^Pz{z) 

zez zeB z^B 

< Pz{z) + - y/Jl)Pz{z) 

z€B zfB 

= Pr{B} + (l-VM)(l-Pr{S}) 

= VMPr{B} + (1 - ^). 

This, together with E(Z) > 1 — //, implies that Pr {5} > 1 — ^Jfx. □ 



Now, we define 



= Pr(i 

[n 



4(X”,V>„(^„(X")))<jD + 7|C 



and apply Lemma 5.6.1 to (5.6.25), setting 



Pr < Pr 






> 



Then, we obtain 




(5.6.26) 

Since the sum of the right-hand sides of (5.6.24) and (5.6.26) is equal to 
1 + Q^n and is greater than 1, there must exist at least one deterministic code 
satisfying 



E|^n(^")| < 
Pr 



7(X^;y^) + 4n7 



1 — (\/^ + OLn) 

V-n(Vn(X”))) < D + 7| > 1 - ^/A;. 

By noticing — > 0 and 0 as n ^ oo, we obtain 

limsup — E|(/?n(^’^)| < limsup — /(X’^; y^) + 47 

n—^00 ^ n^oo Bj 

= /(X;Y)+47 
< i? + 47 



(5.6.27) 

(5.6.28) 



from (5.6.7) and (5.6.27), and 

p-limsupld„(X",V’n(</Jn(^"))) <P> + 7 

n— >oo B 

from (5.6.28). 

In order to complete the proof we choose a positive sequence satisfying 
7i > 72 > • • • > 0 and 7 a; ^ 0 as A: ^ 00 and repeat the argument above 



5.7 Rate-Distortion Function Rva{D\X.) 



357 



in the order of 7 = 71, 7 72, • • • instead of using 7 > 0. This argument 

develops the existence of the variable-length code satisfying (5.6.3) 

and (5.6.4). Now, we can conclude that 

Ryrn{D\X)< _inf /(X;Y) 

Y:D(X,Y)<D 

(the diagonal line argument). □ 



5.7 Rate-Distortion Function R„a(r)|X) 

Finally, we treat the variable-length coding under the average distortion cri- 
terion. In order to establish a general formula of this case, we need to extend 
the class of encoders. That is, we need to consider the class of stochastic 
encoders (pn(x) ^ satisfying the prefix condition. For each source out- 
put X G a stochastic encoder does not uniquely determine a codeword 
for X but generates a codeword subject to a probability distribution Pn(*|x) 
depending on x, where 

Cn = {u e U*\Pn{u\x.) > 0 for some x G 

is required to satisfy the prefix condition. Let Y = be a general 

source taking values in reproduction alphabets and define Z)(X, Y) 

and /(X; Y) by (5.5.1) and (5.6.1), respectively. Then, we have the following 
theorem: 

Theorem 5.7.1 (Han [35]). For any distortion measure dn satisfying the 
uniform integrahility (5.3.4), it holds that 

R,a{D\X)= inf /(X;Y), (5.7.1) 

Y:D(X,Y)<D 

where a variable-length encoder (fn can be a stochastic encoder satisfying the 
prefix condition. 

Remark 5.7.1. Hashimoto [48] shows a claim corresponding to Theorem 
5.7.1 under the condition that the distortion measure dn is additive and 
there exists an r G T satisfying supdi(x,r) < +00. This condition is much 

stronger than the uniform integrahility (5.3.4). □ 



Remark 5.7.2. If (5.7.1) is expressed in a form of the distortion-rate func- 
tion, we have 

D,a{R\X)= inf P(X,Y). 

^ Y;/(X;Y)<R ^ ^ 



( 5 . 7 . 2 ) 



358 5 Rate-Distortion Theory 



Remark 5.7.3. The right-hand side of (5.7.1) can be expressed as 

inf 7(X;Y) = limsup inf (5.7.3) 

Y:D(X,Y)<L> „^oo Y":iEd„(Y",Y")<D U ^ 

which can be verified in the following way. Since D(X, Y) < D implies that 
-Edn{X^,Y^) <D + J (Vn>no), 



-7(X”;r") </(X;Y )+7 (Vn > hq) 
n 

for an arbitrarily small 7 > 0 , it follows that 
inf /(X;Y) 

Y:D(X,Y)<D 

> inf l 7 (X”;Y ")-7 (Vn > no). 

y" ::iE<i„(X",y”)<D+7 U 

Hence, 



inf 7(X; Y) 

Y:D(X,Y)<D 



1 



> limsup inf -7(X”; Y”) - 7 . 

n —00 y";iE<i„(X",y")<D +7 n 



(5.7.4) 



On the other hand, since for each n = 1 , 2 , • • • there exists a Y'^ satisfying 

(5.7.5) 



inf l7(X"; Y”) > Y") - 7 , 

Y^-:^EcZ„,(X^,y^)<D+7 Tl Tl 



1ec!„(X”,Y”) <T? + 7 , (5.7.6) 

n 

(5.7.5) gives rise to 

limsup inf l 7 (X”;Y ")-7 

n-^cyo n 

> limsup l7(X";Y") -27 

n-H-oo 

= 7(X;Y)-27, (5.7.7) 

where Y = By noticing that (5.7.6) means T)(X, Y) < D + 7 , we 

have 

7(X;Y)> inf /(X;Y). (5.7.8) 

Y:D(X,Y)<D+7 ^ ^ 

Thus, in view of (5.7.4), (5.7.7) and (5.7.8) we obtain 



5.7 Rate-Distortion Function Rya(D\X.) 



359 



inf /(X;Y) 

Y:D(X,Y)<D 

> lim sup inf -/(X"; F") - 7 

^_,oo n 



> 



inf 

Y:D(X,Y)<D+7 



7(X;Y)-27- 



(5.7.9) 



We note here that the quantity in the middle of (5.7.9) is convex with respect 
to 7 > 0 and therefore is continuous with respect to 7 > 0. We also note that 
the rightmost side of (5.7.9) is right- continuous with respect to 7 > 0 from 
Remark 5.7.5 below. We obtain (5.7.3) by letting 7 ^ 0 in (5.7.9). 

Similarly, we can show that the right-hand side of (5.7.2) can be expressed 
as 

inf £>(X,Y) = limsup inf -Ed„(X”,Y"). (5.7.10) 

Y:/(X;Y)<R n^oo Y^:^J{X^'--,Y^)<R U 



It is easy to verify that the right-hand sides of (5.7.3) and (5.7.10) are convex 
with respect to D and i7, respectively (cf. Gray [31]). □ 



Remark 5.7.4. The three kinds of the rate-distortion functions 

Rfm{D\X), Rfa{D\X), R,m{D\X) 

defined so far are not always convex with respect to D. However, Rya{D\X) is 
always convex with respect to D by virtue of Theorem 5.7.1 and Remark 5.7.3. 

□ 



Proof of Theorem 5.7.1. 

1 ) Converse part: 

Suppose that (R, D) is ua-achievable. That is, suppose that a variable- 
length code {(Pni'f’n) satisfying 

limsuplE|^„(X")| < R, (5.7.11) 

n— >cxD n 

iimsuplErf„(X«,V'n(^„(X”))) < D (5.7.12) 

n— >oo R 

is given. Setting — '0n(^n(X^)) and Y = (5.7.12) immediately 

can be written as 

D{X,Y)<D. (5.7.13) 

On the other hand, since satisfies the prefix condition, we obtain 

7(X;Y) <R (5.7.14) 

from (5.7.11) in the same way as the proof 1 ) of Theorem 5.6.1. The combi- 
nation of (5.7.13) and (5.7.14) means that 



360 5 Rate-Distortion Theory 



R,a{D\X)> inf /(X;Y). 

^ ^ Y:D(X,Y)<D ^ ^ 



2) Direct part: 

In order to establish the direct part, it is sufficient to prove that an arbi- 
trary {R, D) satisfying 

/(X;Y) + 27<i?, (5.7.15) 

JD(X,Y) < D (5.7.16) 



is ua- achievable, where 7 > 0 is an arbitrary small constant and Y = 
{yn}^^ is an arbitrarily given general source taking values in reproduction 
alphabets To this end, we use the information-spectrum slicing 

technique similarly to the proof of the direct part of Theorem 5.6.1. We set 

^(x; y) = log^ — — (x e A"", y € y^) 
for simplicity and dehne 



= \ (x,y) G A” X r 



-7< -i(x;y) </(X;Y)+7 



}■ 



where 7 > 0 is an arbitrarily small constant. Then, by using /(X; Y) > 0 
and the dehnitions of /(X; Y) and /(X; Y), we have 



= Pr ^ 5'n} — > 0 as n 00. 



(5.7.17) 



Setting L = (/(X; Y) + 7)/7, we partition Sn into L -f 1 subsets dehned 
by 



5^ = |(x,y)€A”x3;" 



(Z - 1)7 < -i(x;y) < /7 ^ (/ = 0, 1, • • • , L) 



n 



{information- spectrum slicing). Clearly, it holds that 



^n = U^n^- 



l=Q 



We will construct a (stochastic) variable-length code {'pni'^n) based on this 
partition of the interval similarly to the proof of the direct part of Theo- 
rem 5.6.1. 



a) Generation of a random code: 

For each / = 0, 1, • • • , L set 

Ri = (? + l)7, (5.7.18) 

(5.7.19) 

where K = |Z//|and generate Mi codewords v/^i, v/^2? • • • Xl,Mi ^ indepen- 
dently subject to the probability distribution Py^ {random coding). Dehne 



5.7 Rate-Distortion Function Rva{D\X.) 361 



Cl = 

C = {Co, Cl, ■ ■ ■ ,Cl} . 

b) Encoding function Tp^: 

Setting 

for each I = 0, 1, • • • , L, (5.7.19) tells us that we can define a fixed-length 
encoding function 

: Ml ^ 

as a one-to-one mapping with | 1 1 = M; . In addition, we prepare a fixed- 

length encoding function 

for discriminating I = 0, 1, • • • , L, where rriL = logj^{L + 1). It is clear that 
we can choose a one-to-one mapping as g. 

Now, we define a variable-length encoding function Tp^ : W. Denote 

by j.(n) e (n = 1, 2, ■ • •) the reference word guaranteed by the assump- 
tion of the uniform integrability of the distortion measure dn- Given a source 
output X G the encoder Tp^ : — > W first generates w G randomly 

subject to the conditional probability distribution Py^\x^{'\^) then out- 
puts a codeword according to the following rules {stochastic encoder): 

i) The case of (x, w) G for some fco — 0, 1, • • • , L: 

If there exists at least one j = 1, 2, • • • , M^q satisfying 

(x.Vfeoj) e and d„(x,Vfcoj) < d„(x,w), (5.7.20) 

we define ^„(x) = lg{ko)'ipl^°\jo), where jo is one of such j. If there 
exists no such j = 1, 2, • • • , Mko, we define ^^^(x) = 0. 

ii) The case of (x, w) ^ Sn^ for all A; = 0, 1, • • • , L: 

We define ^^(x) = 0. 

c) Decoding function 

When the decoder '0^ : receives an output uu = ^n(^) froin 

the encoder Tp^ determined by i) and ii) above, the decoder first checks the 
leftmost symbol u.liu = 0, the decoder outputs '0^(uu) = If u == 1, the 
decoder reproduces (fco, jo) from u = g{ko)Tpn^\jo) following u and outputs 
'0^(uu) = ^kojo- Such (fco, jo) is uniquely determined from u. 

d) Evaluation of coding performance: 

First, define an event T^(C) by 

Tn{C) = {(x,y) G A'" X T" l^n(x) = 0 and (x,y) G 5,} . 



362 5 Rate-Distortion Theory 



Note here that Tn{C) is an event depending on the random code C. Then, by 
using the definition of the encoder for each realization of C the average 
codeword length El^^(X’^)| is evaluated as 

L 

< 53 Pr {(X", w) G n T^{C)] (log^(L + 1) + nRi + 1) 



/=0 



+ 



L, 

53Pr{(X”,w) € 5^ nr„(C)} +Pr{(X”,w) 



z=o 

L 



< n ^ Pr { (X- , w) 6 5 W } (Z + 1)7 + Pr { (X” , w) ^ 5„ } 



z=o 

+ log;^(I/ + 1) + 1 

L 
n 



53 Pr {x"y” € 5^ } (/ + 1)7 + Pr {X”X” ^ 5„} 



z=o 



(5.7.21) 



+ logic + 1) -h 1. 

We can evaluate the first term on the right-hand side of (5.7.21) in the fol- 
lowing way: 

L 



Ep^{ X"F” € 5 ^} {I + 1)7 






< 



^Pr{x”y«e5W}a-l)7 + 27 



z=o 



[n 



< E { -*(X”; y”)l I -7 < P") < -f(X; Y) + 7 



27 



eZ -i(X”;y”)l 



[n 



-i(X”;y") </(X;Y) + 7 

n 



E<^ -i(X";y")l 



-i(X";Y”) 



n 



< -7 I 



+ 27 



< i/(X”;y") + Tiog^e + 27, 

n ne 



(5.7.22) 



where the last inequality follows from Lemma 3.2.4. By substituting (5.7.17) 
and (5.7.22) into (5.7.21), it follows that 

E|^„(X")| 

< /(X”; y”) + - log^ e + 2 n 7 + log^^ (L + 1) + /i„ + 1 , 

Therefore, we obtain 

^El^JX”)! < i/(X"; y") + 27 + A„, 



(5.7.23) 



5.7 Rate-Distortion Function Rya{D\X.) 363 



where 

Xn = — e H — log;^(L 4- 1) -h — . (5.7.24) 

ne n n 

Clearly, > 0 as n — > oo. Hereafter, we evaluate the average distortion 
of the code faking (5.7.20) and w being randomly generated 

subject to the conditional probability distribution Py^\x^{'\^) consid- 
eration, for each realization of the random code C the average distortion can 
be evaluated as 

-EdniX^M^ni^n)) 

n 

< - X] Px"y"(x,y)d„(x,y) 

” (x,y)€S„nT=(C) 

^ (x,y)€S„nT„(C) 

+- Y] Px"y"(x,y)c/„(x,r(”^) 

(x,y)€SS 

< - V Px"y^(x,y)(/„(x,y) 
n 

(x,y)€A'-xy- 

+ - V Px-*y»(x,y)d„(x,r(")) 
n 

(x,y)eT„.(C) 

+ - X! -Px"y"(x,y)<i„(x,r("^) 

(x,y)e5'^^, 

= iEc!„(X",y”) 

n 

+ Y1 -Px-y"(x,y) (ici„(x,r(”))) 

(x,y)6T„(C) ^ 

+ Px"y"(x,y) (id„(x,r("^)) . (5.7.25) 

(x,y)€S'^, ' 

In order to evaluate the second term on the right-hand side of (5.7.25), let 
us develop an upper bound of 

Ec(Pr{(X”,w)€T„(C)}), 

where denotes the expectation with respect to the random code C. Set 
Q^')(u|x,C) = Pr |^d„(x,v;j) > u for Vj = 1, 2, • • • , M;| 



and notice 



364 5 Rate-Distortion Theory 



Pr<^(x,w)er„(C) 



(x, w and id„(x, w) 

= Q^*Hu|x,C). 

Then, Ec(Pr {(-X’”, w) € T„(C)}) can be expressed as 
Ec(Pr{(X«,w)Gi;(C)}) 

= Ec ( 5] Px.(x)^Pr{(x,w) e 

X Pr |(x, w) € Tn{C) (x, w) e 

- Ec f ^ Px^‘(x)^Pr{(x,w) 6 5(')} 

VxGA’^^ 1=0 

poo > 

X / Q(')(u|x,C)dpW(u|x) 

Jo J 

= ^ Px.(x)^Pr{(x,w)6 5«} 



xG^” 



/=0 



where 



x^ Ec{Q^^\u\K,C))dFi^\u\x), (5.7.26) 

Ei'^(w|x) = Pr |id„(x, w) < u (x, w) e 5^'^ | 

is the conditional cumulative distribution function of — d^(x, w) under the 

n 

condition (x, w) G . By noticing 

Pr I id„(x, r”) < zij > Pr |^rfn(x, T") < u and (x, T”) G 5^ | 

and applying the same argument using the inequality (5.2.20) given in the 
proof of the direct part of Theorem 5.5.1, Ec [Q^^\u\x, C)) on the right-hand 
side of (5.7.26) can be evaluated in the following way: 

Ec(q(')(«|x,C)) 

< exp 1^— e""'' Pr •(— d„(x, w) < u and (x, w) G 
= exp Pr { (x, w) € | 

X Pr|id„(x,w) < u (x,w) G Sd'l 



exp 

exp 



-e"'’' Pr I (x, w) G Sd'> I pW [u\x) 






«n^x)P,('>(u|x) 



5.7 Rate-Distortion Function Rya{D\X.) 



365 



where we set 

aW(x)=Pr{(x,w)€5<')} 

for simplicity. Since Fn\u\x) is monotone nondecreasing and right-continuous 
with respect to u, we have 



I" Ec {q^^\u\k,C)) dFi^\u\^) 

/•oo 

< J exp^-e'^'^alP{x)Fd\u\x)^dFd'>{u\x) 



exp 

1 






(x)s] 



exp 



ds 

e^^an\x) 



e^'fa^\x) e«'i'ar(x) 

1 






< 



e"T'ai'\x) 
for the case of (x) > 0 and 



£°Ec (Qd\ulx,C)) dFd^uix) < 1 

for the case of an^(x) = 0. By substituting these inequalities into (5.7.26), 
we obtain 



Ec(Pr{(X",w)GT,(C)}) 

< ^ Px'.(x) h 

0 <KL:a<‘)(x )>0 (x) 






= E W E ^ 



xGA’^'- 

^ i + i _ 

^ enj 



0<l<L:aiP(x)>0 



Clearly, — > 0 as n ^ oo. This means that there must exist a realization C* 

of the random code C satisfying 



Pr {X^Y^ e T*} - Pr {(X^,w) g T*} < ^ 0 as n oo, (5.7.27) 

where T* = Tn(C*). Denoting by ((pn,'^n) the code (^^,-0^) with C = C*, 
(5.7.23) and (5.7.25) can be written as 

-E\^r^{X^)\ < -J(X"; y") + 27 + A„, 
n n 



(5.7.28) 



366 5 Rate-Distortion Theory 



n 

< iEcZ„(X",y”) 

n 

+ ■Px’*y'*(x,y) ( -d„(x,r 

^ ' V Tl 

(x,y)GT;: ^ 

+ 'Y2, -Px"y"(x,y) Qdn(x,r^"^)y (5.7.29) 

(x.yjess ^ 

respectively. However, since the distortion measure dn is assumed to satisfy 
the uniform integrability, the second and the third terms in (5.7.29) turn out 
to satisfy 

^ Px"y"(x,y) (id„(x,r^"))) ^0 as n ^ oo, 

(x.y)€T* 

Px'*y"(x,y) (^d„(x,r("))) -> 0 as n -» cx) 

(x,y)eS's ^ 

due to (5.7.17) and (5.7.27), respectively. Therefore, we obtain 
limsuplE|v?„(X")l < limsupl/(X";y") + 27 

n— >00 n—^00 ^ 

= 7(X;Y)+27 

< R, (5.7.30) 

limsuplEd„(X",V>„(9„(X”))) < limsuplEd„(X”,y”) 

n — >00 ^ n — ^00 

= D{X,Y) 

< D (5.7.31) 

from (5.7.15), (5.7.16), (5.7.28) and (5.7.29). Equations (5.7.30) and (5.7.31) 

show that {R, D) is ua-achievable. □ 

We have developed Theorem 5.7.1 on the variable- length coding under 
the average distortion criterion. In the theorem a stochastic encoder (pn is 
assumed to be available. In order to realize this stochastic encoder, however, 
we need to generate w(x) (called the test codeword for x) randomly subject 
to the probability distribution Pyn|xri(*|x) for each source output x G in 
advance. Once a collection w = {w(x)}^^^„ of the test codewords w(x) is 
determined, we have a deterministic encoder W . Readers may 

feel that we must prepare deterministic encoders because w(x) 

takes values in in general. Nevertheless, preparing only two stochastic 
encoders is actually enough for the encoding. This property is described in 
the following corollary: 




5.7 Rate-Distortion Function R^a(D|X) 



367 



Corollary 5.7.1. Theorem 5.7.1 still holds if the stochastic variable-length 
encoder (fn : U* is restricted to the class of stochastic encoders such 

that two deterministic variable-length encoders — > W and (fn^ : 

X'^ W are randomly chosen with probabilities On ^ and an^ with On ^ + 
= 1, respectively, (cf. Hashimoto [48])- 

Proof. Let '^n) be a code constructed in the direct part of Theorem 5.7.1. 
Denoting by the deterministic encoder, we have 

E|vp„(X")|=^Pr{w}El<mi. 

W 

Ed„(X”,V'„(<^„(X”))) = ^Pr{w}Ed„(X”,V’n(¥>^(X'^))). 

W 

Then, Eggleston’s theorem [24] tells us that there exist w = w and w = W 2 
satisfying 

^aWEly.rmi < ^Pr{w}E|<^-(Z")|, 

i=l w 

1=1 W 

for some probability distribution {an\oLn"^) with = 1. Therefore, 

by letting 99 * be the stochastic encoder choosing the two deterministic en- 
coders and with probabilities On ^ and an \ respectively, and using 
Pri as (/?*, the code (v?n?'0n) satisfies the conditions (5.7.30) and (5.7.31) on 
the achievability. □ 

The following theorem claims that the encoder can be deterministic if a 
source X is stationary and the distortion measure dn is subadditive (see §5.9 
below) . 

Theorem 5.7.2. If a general source X — is stationary, it holds 

that 

Ka{D\X)^ inf /(X;Y) 

for any subadditive distortion measure dn satisfying the uniform integrability 
(5.3.4), where an encoder pn 'Is deterministic. 

Proof. The stationary source has an invariant probabilistic structure with re- 
spect to the time shift. We first choose a sufficient large block length n = NL 
(suppose that both N and L are sufficiently large) and divide the block of 
length n into L subblocks of length N. Letting and the two de- 

terministic encoders given in the proof of Corollary 5.7.1, we use for 



368 5 Rate-Distortion Theory 



the first L subblocks and for the next subblocks. Then, the 

stochastic encoder given in the proof of Corollary 5.7.1 is realized as a 
deterministic encoder for the whole block of block length n. Note that, by 
virtue of the subadditivity of the distortion measure, the distortion of the 
whole block of length n is upper-bounded by the sum of distortions of all 
subblocks of length N. □ 



Remark 5.7.5. In fact, the infimums in the formulae (5.4.3), (5.5.2), (5.6.2) 
and (5.7.1) in Theorems 5. 4. 1-5. 7.1 can be replaced with the minimums. This 
is because the infimums are taken with respect to general sources Y. For 
example, in order to check that the infimum on the right-hand side of (5.6.2) 
in Theorem 5.6.1 can be replaced with the minimum, it is sufficient to show 
the existence of Yq = such that 



7(X;Yo) <r, 


(5.7.32) 


D{X,Yo)<D 


(5.7.33) 


Yi = {yr)n=i (* = 1. 2, • • •) satisfying 




lim 7(X;Yi) =r, 

i-^oo 


(5.7.34) 


D{X,Yi)<D (i = l,2,.-.) 


(5.7.35) 



is given, where X = is a given source. To this end, let {sn} be a 

sequence of positive numbers satisfying £1 > £2 > • • • > 0 and lim = 0. 

n— >00 

Then, (5.7.34) guarantees the existence of a sequence of integers satis- 
fying 

/(X; Y^^) < r + 

where > 00 as fc — > 00 . In addition, from the definition of 7(X;Y^^) 
there exists a sequence of positive integers {n^} with ni < ri 2 < • • • — ^ +00 
satisfying 

1 ;:) < r + (Vn > n,). (5.7.36) 

On the other hand, the definition of D(X, Y^^) and (5.7.35) tell us the exis- 
tence of a sequence {rrik} satisfying 

+ (Vn>mfc) (5.7.37) 

and mi < m 2 < • • • ^ 00 . Now, define Ik = max(n/c?^fc) und define Yq = 
{yo”}^=i by 

yo" = v;” for k<n<k+i. 

Then, (5.7.36) and (5.7.37) yield 



5.8 Rate-Distortion for Stationary Memory less Sources Revisited 



369 



-/(X";ro”)<r + 2£fe, 
n 

Pr|id„(X”,ro") >-D + £fc| 

for h ^ Therefore, we obtain 

limsup — /(X^; ^ 

n— >cx) ^ 

p-limsup — dn(X^,yo^) ^ T), 

n—^00 ^ 

which are exactly as the same as (5.7.32) and (5.7.33), respectively. 

Similarly, we can show that i?/^(T)|X), Rfa{D\X.), Rym{D\X.) and 
Rya{D\^) are lower semicontinuous with respect to D and hence, together 
with their nonincreasing property, these functions are right- continuous. □ 



5.8 Rate-Distortion for Stationary Memoryless Sources 
Revisited 

So far we have developed the formulae of the rate-distortion functions 
Rfm{D\X.), Rfa{D\X.), Rym{D\X.) and Rya{D\X.). Since the formulae are 
valid for general sources X, readers may feel it straightforward to obtain 
claims corresponding to Theorem 5.2.1 if these formulae are applied to 
the stationary memoryless sources treated in §5.2. However, such an argu- 
ment is not so simple. Actually, difficulty arises because Theorems 5.4.1- 
5.7.1 describe the rate-distortion functions in quite a general manner in 
terms of the quantities used in the information-spectrum methods such 
as D(X, Y), JD(X, Y), /(X; Y) and /(X; Y), while the rate- distortion func- 
tion in Theorem 5.2.1 is expressed in a completely different form by us- 
ing Ed(X, y), the expectation of the distortion, and the mutual information 
I{X]Y). In this section, we start from Theorems 5.4. 1-5.7. 1 and try to ob- 
tain the claims corresponding to Theorem 5.2.1 by direct computation. Such 
computation is characteristic of the information-spectrum approach and is 
also of interest. 

To begin with, we give the theorem that is our goal in this section (cf. 
Csiszar and Korner [19]). 

Theorem 5.8.1. Let X = {X'^ = (Xi, X 2 , • • • , X^)}^! be the stationary 
memoryless source subject to a probability distribution Px, where both a 
source alphabet X and a reproduction alphabet y are assumed to be finite 
sets. If a distortion measure dn {d = d\) is additive, it holds that 

Rfm{D\X) = Rfa{D\X) = Rym{D\X) = Rya{D\X) 

= min I(X]Y). 

Y:Ed{X,Y)<D 



(5.8.1) 



370 5 Rate-Distortion Theory 



Proof. For a given reproduction process Y = 
define 



^2 5 



■,Yn 



(n) 



oo 



Y = 




^(«) ^(«) 
5-^2 1 ' ' ' ^ ^ n 



)} 



oo 

n=l 



as the reproduction process such that — (n) = P^ ^(ro ii = 1, 2, • • • , n) and 

XiY i 

(^) — (^) 

, • • • , XnY^ are independent. Then, the following two lemmas hold: 



Lemma 5.8.1. If an additive distortion measure dn satisfies (5.3.6), then 
S(X,Y) <S(X,Y). (5.8.2) 

Proof. Since the distortion measure dn is additive, we have 

i=l i=l 

by taking — («.) = P^.y(r,.) into consideration. Furthermore, by noticing 

that 

0<d{Xi,Yj^ ) < <imax = max d{x,y) < +oo, 

xex.yey 

Chebyshev’s inequality tells us that 
D{X,Y) = p-limsupid„(X”,F") 

n— »oo 

= lim sup - J2 ) 

n— »oo ^ 

1=1 

= lim sup - V Ec/( Yi , ) 

n—^oo ^ 

1=1 

= limsup-Ed„(Y",y") 

n— >oo 

= Z)(X,Y). 

On the other hand, 

Z„ = 

n 

satisfies the uniform integrability from the condition (5.3.6). Then, Lemma 5.3.2 
implies that I)(X,Y) < P(X,Y). Consequently, we obtain L)(X,Y) > 
P(X,Y). □ 



The inequality given in the following lemma can be regarded as an 
information-spectrum version of the mutual information inequality 

n 

^ J(Yi; y/”^) = 7(X”; Y") < 7(X”; Y") 

i=l 

for a memoryless source X = (cf. Lemma 3.2.3 in Chapter 3). 



5.8 Rate-Distortion for Stationary Memory less Sources Revisited 



371 



Lemma 5.8.2. Let at least one of a source alphabet X and a reproduction 
alphabet y be a finite set, then for any memoryless (but not necessarily sta- 
tionary) source X we have 

7(X;Y)<7(X;Y). 



Proof. This lemma follows from the same argument used in the proof of 
Lemma 3.2.3 in Chapter 3. We use /(X; Y) and /(X; Y) instead of /(X; Y) 
and /(X; Y), respectively, and note the direction of the inequalities. We also 
use 




Pyr.{Y-) 



instead of 




TY”(Y"|X") 

Py..(Y") • 



Now, we return to the proof of Theorem 5.8.1. 



a) While Theorem 5.4.1 claims that the /m-rate-distortion function is given 
by 

Rfm{D\X)= _inf 7(X;Y), (5.8.3) 

Y:D(X,Y)<D 

Lemma 5.8.1 and Lemma 5.8.2 guarantee that Y on the right-hand side of 
(5.8.3) can be restricted to memoryless Y satisfying 

n 

Px-y"(x,y) = Y[P,^,y(r.){xi,yi). (5.8.4) 



Since the alphabets are assumed to be finite sets, applying Chebyshev’s in- 
equality yields 



/(X; Y) = p- l.m sup - log 

= p- hm sup - ^ log ^ 

^ Py^iYi O 

= limsupl^7(Xi;Y/"^). 



i=l 



Similarly, we also obtain 



— 1 ^ 

P(X, Y) = limsup - ^Ed(Xi,y/"^). 



i=l 



(5.8.5) 



(5.8.6) 



372 5 Rate-Distortion Theory 



Now, let Qn be the random variable with Pq^X'^) = ^ for z = 1, 2, • • • , n and 

define XYn by XYn = XiY^^^^ if Qn = Then, (5.8.5) and (5.8.6) can be 
written as 

7(X;Y) -limsup/(X;yn|Qn) 



= limsup/(X;(3nyn) 

n— >oo 



> limsup/(X; y^), 

n-^oo 

D{X,Y) = limsupEd(X,y„), 

n— >oo 



(5.8.7) 

(5.8.8) 



where the second equality in (5.8.7) follows from the independence of X and 
Qn- Now suppose that D(X, Y) < D. Since (5.8.8) implies 

Ed{X,Yn)<D-]-j (Vn>no) 

for an arbitrarily small 7>0, (5.8.7) yields 

7(X;Y)> min /(X;Y). 

y:Ed(X,y)<D+7 



Recall here that the right-hand side of this inequality is continuous with 
respect to 7 from Remark 5.2.1. By letting 7 — > 0, we have 



/(X;Y)> 



min I(X;Y), 

Y:Ed{X,Y)<D 



which leads to 

h{D)^ _inf 7(X;Y)> min 7(Y;y). (5.8.9) 

Y:D(X,Y)<D Y:Ed(X,Y)<D 

On the other hand, consider (X, Y) achieving the minimum in (5.8.9) and 
let (XiYi, • • • , XnYn) be the memoryless process subject to the probability 
distribution Pxy- If we define 

Y = {y" = (y/”\ ^ 

by -P;(-,y(-) = PxY (i = 1, 2, • • • , n), we obtain 
D(X,Y) = Ed{X,Y) < D, 



/(X;Y)=/(Y;Y) 



from Chebyshev’s inequality again. Consequently, it holds that 

h(D)< min I(X'Y). (5.8.10) 

Y-.Ed(X,Y)<D ^ ^ 

Now we can conclude that 



RfmiDlX) = h(D) = min /(X; Y) 
^ ^ ^ ^ y:Ed(x,y)<D ^ ^ 



(5.8.11) 



5.8 Rate-Distortion for Stationary Memoryless Sources Revisited 



373 



from the combination of (5.8.9) and (5.8.10). 



b) For the same Y in Lemma 5.8.1 and Lemma 5.8.2, it holds that 
D(X,Y) = limsuplEd„(X",y”) 

n— >oo 

= limsup i 

n-^oo ^ . - 

1=1 

= limsup lEc/„(X”,y”) 

n— >oo 'tl 

= D{X,Y). 

By using (5.8.12) and Lemma 5.8.2, we obtain 
Rfa(D\X)= min I(X-Y) 

^ ^ Y:Ed{X,Y)<D ^ 

ixom the same argument in a). 



c) For the same Y in Lemma 5.8.1 and Lemma 5.8.2, it holds that 
7(X^; Y^) - H{X^) - H{X^\Y^) 

n n 

= Y, H{Xi) - Y 



i=l 

n 



i=l 

n 



>YH{Xi) -YH{xm) 

i=l i=l 

= YH{X,)-YH{X,\Yf) 

i=l i=l 

n 



= Ymxr) 



i=l 

-/(X^;F^). 

Then, it follows that 

7(X;Y) = limsup 



> limsup —7(X^; y 

n^oo 

= /(X;Y). 



We obtain 



Rym{D\X)= min I(X;Y) 

^ ’ Y-.Ed{X,Y)<D ^ 



(5.8.12) 

(5.8.13) 



(5.8.14) 



374 5 Rate-Distortion Theory 



by noticing Lemma 5.8.1 and using the same argument in a). 



d) Since i7/a(jD|X) > Rya{D\X.) from (5.3.11) in Theorem 5.3.1, (5.8.13) 
yields 

Rya(D\X.) < min I{X\Y). (5.8.15) 

Y-.m{X,Y)<D 

On the other hand, (5.8.12), (5.8.14) and Theorem 5.7.1 imply 

Rya{D\X)>_ inL /(X;Y). 

Y:D(X,Y)<D 

Then, by using the same argument establishing (5.8.11), we obtain 
inf /(X;Y)> min /(X;Y), 

Y:D(X,Y)<D Y:Ed{X,Y)<D 

which means 



Rya(D\X)> min I(X;Y). (5.8.16) 

Y-.Ed{X,Y)<D 

Therefore, it follows from (5.8.15) and (5.8.16) that 

Rya{D\X)= min /(X;Y). 

Y:Ed{X,Y)<D 



Example 5.8.1 (Rate-distortion function for nonstationary memo- 
ryless sources). As an application of the argument above, let us con- 
sider the rate-distortion function for a memoryless, but nonstationary, source 
X = (Xi, X 2 , • • •)• We can prove 



Rfm{D\X) = Rfa{D\X) = Rym{D\X) = Rya{D\X). (5.8.17) 

for such sources as well by using the argument in the proof of Theorem 5.8.1. 
Equation (5.8.17) is obtained because (5.8.5), (5.8.6), 

/(X;Y) = limsup- (5.8.18) 

n—^oo ^ 

and 

1 

D{X,Y) = limsup- (5.8.19) 

n^oo ^ 1 

1=1 

still hold for Y without stationarity if Y satisfies the memoryless property 
(5.8.4). For example, let us consider a memoryless source specified by 

^ f Pxi (if i is odd), 

1 ^X 2 (if ^ is even). 

Define Ri{D) and R 2 {D) by 



Ri{D) - 



min 

Y-.Ed{Xi,Y)<D 



I{Xi;Y), 



5.8 Rate-Distortion for Stationary Memory less Sources Revisited 



375 



R2(D) = 



min 

Y-.Ed{X2,Y)<D 



I{X2\Y). 



Then, (5.8.5), (5.8.6), (5.8.18) and (5.8.19) imply that (5.8.17) coincides with 
the minimum of 

with respect to all {Di^D 2 ) satisfying \Di + \D 2 < D. □ 



Example 5.8.2. Let us apply Theorem 5.8.1 to the following case. Let a 
source alphabet and a reproduction be = T = {0, 1} and X the station- 
ary memoryless source subject to a probability distribution Px- Define an 
additive distortion measure d by 

. f 0 for X = u, 

(the Hamming distortion measure). Setting po = Px(0),pi = -Px(l) and 
Pmin = min(po, Pi), the rate-distortion function P(D) = Rfm{D) = Rfa{D) = 
Rvm{D) = Rva{D) is given by 




KPmin) - h{D) for 0 < D < Pynin, 
0 for D ^ Pmin, 



(5.8.20) 



where h{-) denotes the binary entropy. We can develop (5.8.20) in the fol- 
lowing way. Without loss of generality, we can assume p^in = Po- First, 
consider the case of D > Pmin- Since defining Y by Py(l) = 1 yields 
Ed(X, y) — Pq = Pmin < D, it is obvious from Theorem 5.8.1 that 
R{D) = 0. Next, consider the case of 0 < D < Pmin- We choose Y satis- 
fying Ed(X, Y) < D and define 6: = Pr {X ^ F} = Ed(X, Y). Then, by using 
the Fano inequality in §1.1, we obtain 



I{X-Y) = H{X) - H{X\Y) 

> H{X) - h{e) 

> H{X) - h{D) 

~ ^(Pmin) ^(-^)) 



where the second inequality follows from e < D < Notice that the lower 
bound h{pmin) — h{D) is attained if we choose Y satisfying 

Po- D „ Pi - D 



Py(0) = 



1 - 2D’ 






Pi 

1-2D’ 



-Px|y(0|0) — -PxiyClIl) = 1 - -D, Px|y(0|l) = Px|y(l|0) = D 



(notice that Y satisfies Ed{X,Y) = D). Hence, we obtain (5.8.20) owing to 
Theorem 5.8.1. □ 



376 5 Rate-Distortion Theory 

5.9 Rate-Distortion for Stationary Ergodic Sources 



In the preceding section we have shown that Theorem 5.8.1 is obtained from 
Theorems 5. 4. 1-5. 7.1 by direct computation of the rate-distortion functions 
expressed in terms of quantities of the information-spectrum. This section 
is devoted to description of a stronger version of Theorem 5.8.1 obtained 
by using well-known facts in the ergodic theory. We first introduce a class 
of distortion measures called the subadditive distortion measures. We call a 
distortion measure dn subadditive if dn satisfies 



c/„+m(xiX2, yiy2) < dn{xi,yi) + dm{x 2 , Y2) (5.9.1) 

for any n, m, xi G A'f , X 2 G yi G and y 2 G Since additive dis- 
tortion measures are clearly subadditive, the subadditive distortion measures 
can be regarded as extensions of the additive distortion measures treated 
in the preceding section. Notice that, as is easily verified from (5.9.1), any 
subadditive distortion measure dn satisfies 

n 

dn{x, y) < ^ di{xi, yi) (5.9.2) 

i=l 

for X = {xi,X 2 , • • • , Xn) and y = (yi, 2/2, • • • ? Un)- In fact, the term “subaddi- 
tive” originates from this property. We have 



dn(x,y) = max di{xi,yi), 
Ki<n 



dn{'x.,y) = ( 'Y^di{xi,yi)'P 



\i=l 



(p> 1) 



as examples of not additive but subadditive distortion measures dn (cf. Gray 
[31]). 

We have the following key lemma on a subadditive distortion measure dn- 



Lemma 5.9.1. Let a source alphabet X and a reproduction alphabet y be ar- 
bitrary and consider a stationary process X = {X'^ = (Xi, X 2 , • • • , 
as a source. Then, a subadditive distortion measure dn satisfies the uniform 
integrability if and only if there exists an r E y satisfying 

Edi{Xi,r) < -hoo. (5.9.3) 

Proof 

1) Sufficiency: 

Set = (r, r, • • • ,r) G y'^. Since (5.9.2) guarantees 

ld„(X”,r(”))<lVdi(X^,r), 

n n “ 



5.9 Rate-Distortion for Stationary Ergodic Sources 377 



it is sufficient to prove that 
1 ^ 

Zn = -Y^di{Xi,r) (5.9.4) 

i=l 

satisfies the uniform integrability. We note here that, if Edi(Xi,r) < H-oo, 
the individual ergodic theorem together with the mean ergodic theorem tells 
us that Zn converges to the integrable function (the conditional expectation) 

E[di(Xi,r)|^] 

both in almost sure sense and in Li-sense (cf. Billingsley [8]). Hence, 
satisfies the uniform integrability (cf. Billingsley [9]). Here, T denotes the cr- 
field of the set of all invariant sets under the shift on the stationary process X. 

2) Necessity: 

Suppose that dn satisfies the uniform integrability. Then, there must exist 
a reference word ^ satisfying (5.3.4). If we set 

n = 1 in particular, Edi{Xi,r^^) < +oo is obtained. □ 

By using Lemma 5.9.1 we can obtain the following theorem as a general- 
ized version of Theorem 5.8.1. 

Theorem 5.9.1 (Han [35]). Let a source alphabet X and a reproduction 
alphabet y be arbitrary sets (not restricted to finite sets). If a source 
X = = (Xi, • ■ • is stationary ergodic and a distortion mea- 

sure dn is subadditive, it holds that 

Rfm{D\X) = Rfa{D\X) = Rym{D\X) = Rya{D\X) 

= lim inf (5.9.5) 

n-^oo ^ 

that is, all of the four kinds of the rate- distortion functions coincide. Here, 
we assume the existence of an r e y satisfying Edi(Xi,r) < -foo. 

Proof. It is known that 

Rfa{D\X) = lim inf I{X^;Y^) = I^{D) (5.9.6) 

n-^oo Y^:^Edr,{X^,Y^)<D 

for a stationary ergodic source X = and a subadditive distortion 

measure dn satisfying the assumption of the theorem (see Mackenthun and 
Pursley [63] or Gray [31]). On the other hand, Steinberg and Verdu [85] 
recently showed that 

Rfm{D\X) = Rfa{D\X) (5.9.7) 

under the assumption of the theorem. We first obtain 



378 5 Rate-Distortion Theory 



i?„aP|X) = limsup inf -7(X";r”) 

^_,oo Y^-:^Edrr{X^-,Y^>-)<D n 

= lim inf 

n^oo U 

= Ioo{D) (5.9.8) 

from Theorem 5.7.1 and Remark 5.7.3, where the second equality follows 
from the fact that the superior limit can be replaced with the limit if X is 
stationary and dn is subadditive (cf. Gray [31]). This, together with (5.9.6) 
and (5.9.7), yields 

i^,,(i^|X) = Rfa{D\X) - Rfm{D\X) = I^{D). (5.9.9) 

On the other hand, since Edi(Xi,r) < -foo from the assumption of the 
theorem. Lemma 5.9.1 guarantees that the distortion measure satisfies 
the uniform integr ability. Then, we obtain 

Rfm{D\X) = Rfa{D\X) = Rym{D\X) = Rya{D\X) = /oo(D) 

from Theorem 5.3.1 and (5.9.9). □ 



Theorem 5.9.1 immediately yields the following corollary that is also a 
generalization of Theorem 5.8.1. 



Corollary 5.9.1. Let a source alphabet X and a reproduction alphabet y 
be arbitrary sets (not restricted to finite sets) and consider the stationary 
memoryless source X = {X'^ = (Xi, X 2 , • • • , Xn)}^=i subject to a probability 
distribution Px- If dn {d = di) is the additive distortion measure, it holds 
that 



Rfm{D\X) = Rfa{D\X) - R,m{D\X) = R,a{D\X) 



= inf 
Y:Ed{X,Y)<D 



nx-,Y), 



where the existence of an r e y with Ed{X, r) < +00 is assumed. 



(5.9.10) 



Proof. For an arbitrary random variable Y'^ = {Yi, • — ,Yn) correlated with 
the random variable X'^ of the source, denote by F = (Yi, • • • ,Yn) the 
random variable such that y 1 , • • • , Yn are independent and satisfies P^ t? == 
PxiYi for i 1, • • ■ , n. Notice that X{Y i, • • • , XnYn are independent because 
the source X = is assumed to be stationary and memoryless. This, 

combined with the additivity of the distortion measure yields the following 
properties: 

n 

7(X”;y") > 7(X";F”) = ^7(Xi;F,), 

7=1 



-EdniX^Xn = = -f2d{Xi,Yi)- 

n n n ^ 

7=1 



5.9 Rate-Distortion for Stationary Ergodic Sources 379 



By using these two properties and the argument in the proof a) of Theo- 
rem 5.8.1, we can establish the claim of this corollary by virtue of Theo- 
rem 5.9.1. □ 



Example 5.9.1. Let a source alphabet X and a reproduction alphabet 3^ 
\)e X = y = H = (— oo, +oo). Define X = (Xi,X 2 , • • •) as the memoryless 
Gaussian source subject to the probability density function 



Px{x) = 



1 

—==e 2 P . 

x/2^ 



We consider here the additive distortion measure d{x,y) = {x — y)‘^. Since 
the condition (5.9.3) in Lemma 5.9.1 holds with setting r = 0, the claim of 
Corollary 5.9.1 is valid for X. Let us define K{D) as the rightmost side of 
(5.9.10) in Corollary 5.9.1 and compute K{D). We first notice that K{D) = 0 
for D > P because choosing Y with Py(0) = 1 yields Ed{X,Y) = P and 
I{X]Y) = 0. Next, we consider the case of 0 < P < P. Letting Y be an 
arbitrary random variable with Ed(X, Y) < P, we obtain 



I{X-,Y) = H{X) - H{X\Y) 

== H{X)- H{X -Y\Y) (5.9.11) 

> ilog(27 reP)-P(X-r) 

> i log(27reP) - i log(27re(E(X - Yf)) 

= i log(27reP) - i log{2ireEd{X,Y)) 

> i log(27reP) - i log(27reD) 

= ^log^ (5.9.12) 

in view of the maximum entropy theorem given in §3.7. On the other hand, 
if we choose the Gaussian random variable Y with mean 0 and variance 
P — D and consider another Gaussian random variable Z with mean 0 and 
variance P that is independent of T, Y + Z turns out to be the random 
variable subject to Px of the source X. This means that X can be expressed 

1 P 

SiS X = Y + Z. Obviously, we have Ed(X, Y) = D and I{X]Y) ~ 2 ^ 

1 P 

such Y. Then, K{D) = - log — follows because the lower bound in (5.9.12) 

is attained. Summarizing, P(P) = Rfm{D) — Rfa{D) = Rvm{D) = Rva{D) 
is given by 



1 , P 



380 5 Rate-Distortion Theory 

5.10 Rate-Distortion Function for Mixed Sources 

We have studied that all of the four kinds of rate-distortion functions 
Rfm{D\X.), Rfa{D\X), Ryrn{D\X.) and Rya{D\X.) coincide if we treat station- 
ary memory less sources or stationary ergo die sources under the subadditive 
distortion measure dn- Readers may have the intuition that the four kinds of 
the rate- distortion functions always coincide. However, this intuition is not 
true. In fact, it is rather usual that the four rate-distortion functions do not 
coincide. 

This section is devoted to observation on the difference of the four rate- 
distortion functions. The results described in the preceding section tell us 
that we must consider nonergodic sources to find a source with the four 
rate-distortion functions different from each other. One of the simplest and 
the most important nonergodic sources is the mixed source (see §1.4). Here, 
recall that every stationary source (typically, with finite alphabets) can be 
expressed as a mixture of stationary ergodic sources (cf. Gray and Davisson 

[ 33 ])- 

To begin with, let X = be the mixed source of two arbitrary 

sources Xi = {Xf and X 2 = {-^ 2)^1 5 where 

Px- (x) = aiPx- (x) -f a2Px^ (x) (x G X^) 

for some a\ and 0^2 satisfying ai > 0, 0^2 > 0 and ai -f Q 2 = 1- 

We first consider the fixed-length coding of the mixed source X under 
the maximum distortion criterion. In this section the distortion measure 
is arbitrary (not necessarily subadditive) unless stated otherwise. 

Theorem 5.10.1 (Han [35]). Denote by Rfm{D\X.i) and P/^(D|X 2 ) the 
fm-rate- distortion functions for sources Xi and X 2 ; respectively. Then, it 
holds that 

Rfm{D\X) = max (%m(i?|Xi), Rfm{D\X2)) (5.10.1) 

for the mixed source X o/Xi and X 2 . 

We need the following lemma to prove this theorem. If we define X = 
{X«}~ 1 and Y = {Y«}~ , by 

Fx-y- (x, y) = oiPx-y- (x, y) -|- a2Px^Y^ (x, y) 

for two arbitrary reproduction processes Yi = and Y 2 == {Y 2 ^}^^, 

we have the following lemma. 

Lemma 5.10.1. 

7(X; Y) = max(7(Xi; Yi),7(X2; Y 2 )) . 



5.10 Rate-Distortion Function for Mixed Sources 



381 



Proof. This lemma is proved similarly to the proof of Lemma 3.3.1 in Chap- 
ter 3. We just use 7(X; Y), 7(Xi; Yi) and 7(X2; Y 2 ) instead of /(X; Y), 
/(Xi;Yi) and /(X 2 ;Y 2 ), respectively, and replace the minimum with the 
maximum. □ 

Proof of Theorem 5.10.1. 

First, set and = X” x F". 

Then, it follows from Lemma 1.4.2 that 

D{X,Y) = max(D(K,,Yi),D{X2,Y2)) . (5.10.2) 

In addition, Lemma 5.10.1 tells us that 

7(X; Y) = max (J(Xi; Yx), /(X 2 ; Y 2 )) . (5.10.3) 

Therefore, the right-hand side of (5.4.3) can be expressed as the infimum of 

max(7(Xi;Yi),7(X2;Y2)) (5.10.4) 

subject to 

:D(Xi, Yi) < D, :D(X2, Y 2 ) < D. (5.10.5) 

By noticing that Yi and Y 2 can be separately varied under the constraint 
(5.10.5), the infimum of (5.10.4) coincides with 

max ( _ inf 7(Xi; Yi), _ inf 7(X2; Y 2 ) | , 

YYi;D(Xi,Yi)<D Y2:D(X2 ,Y2)<D J 

which is equal to 

max {Rfm{D\Xi), Rfm{D\X 2 )) 

from (5.2.3). □ 



Remark 5.10.1. Equation (5.10.1) implies that the distortion-rate function 
is given by 

Dfm{R\^) = max {Dfm{R\Xi),Dfm{R\'^2)) . 

Example 5.10.1. Theorem 5.10.1 can be generalized under an adequate 
condition in the following way. As was mentioned in Example 1.4.3 in §1.4, 
every stationary source X == can be expressed in a form of the 

mixed source as 

Pxr^{A) — J Px^^ {A)dw{0) {\/n = 1,2, • • • ; Vmeasurable A C Y’^)(5.10.6) 

(cf. Gray and Davisson [33]), where X^i = denotes a stationary 

ergodic source specified by a parameter 6 and w{') is a probability measure 



382 5 Rate-Distortion Theory 



determining the mixed source X. Then, denoting by Rfm{D\'Xe) the fm- 
rate-distortion function for the stationary ergodic source X^, the /m-rate- 
distortion function i^y^(D|X) for the stationary source X is given by 

Rfm{D\K) = w-ess.s\x^Rfm{D\Xe), (5.10.7) 

where u;-ess.sup on the right-hand side means the essential supremum with 
respect to w. Equation (5.10.7) can also be written as 

Dfm{R\y^) = w-ess.supDfm{R\^e)‘ (5.10.8) 

in a form of the distortion rate function. 

Next, we consider the fixed-length coding of the mixed source X under 
the average distortion criterion. Under this criterion it is more convenient to 
use the distortion-rate function than the rate-distortion function. 

Theorem 5.10.2. Suppose that a distortion measure dn satisfies the uniform 
integrability (5.3.4) for the source X. Denote by Dfa{R\^i) and Dfa{R\'X. 2 ) 
the fa-rate- distortion- functions for sources Xi and X 2 , respectively. Then, 
it holds that 

^ OiiDfa{R\^l) CX2Dfa{R\^2)- (5.10.9) 

Proof. The formula (5.5.2) in Theorem 5.5.1 is equivalently written in a form 
of the distortion-rate function as 

Dfa{R\X) = _ inf D{X,Y). (5.10.10) 

Y:/(X;Y)<i? 

Note that D(X, Y) on the right-hand side can be evaluated in the following 
way: 

D{X,Y) = limsuplEd„(X",r”) 

n—^oo ri 

= limsup|^Ed„(Xi",Fi”) + 

n — >oo ^ n n j 

< ai limsup —Edn{Xi, Y^) a 2 limsup —EdniX^, Yff) 

n^oo ri n-^oo ri 

-aiD(Xi,Yi) + a2i^(X2,Y2). (5.10.11) 

On the other hand, (5.10.3) means that the constraint 7(X; Y) < Ris equiv- 
alent to 

T(Xi; Yi) < R and /(X 2 ; Y 2 ) < R. (5.10.12) 

Therefore, the infimum of D(K,Y) with respect to all Yi and Y 2 satisfying 
(5.10.12) is upper-bounded by the infimum of 

aiD(Xi,Yi)Hha2l^(X2,Y2) 



5.10 Rate-Distortion Function for Mixed Sources 



383 



with respect to the same Yi and Y 2 . By noticing that Yi and Y 2 can be 
separately varied under the constraint (5.10.12), it follows from Theorem 5.5.1 
that 

£>/a(-R|X) 

< ai _ inf T)(Xi, Yi) + q ;2 _ inf Z)(X 2 ,Y 2 ) 

Yi :/(Xi ]Yi)<R Y 2 :/(X2 \Y2)<R 

= aiDfa{R\^l) + Ot2Dfa{R\^2)- 

Here, we use the fact that the distortion measure dn satisfies the uniform 
integrability for the sources Xi and X 2 because dn is assumed to satisfy the 
uniform integrability for the source X from the assumption of the theorem. □ 

Notice here that (5.10.9) in Theorem 5.10.2 holds not with equality but 
with inequality. Only the inequality holds for the case that ^Edn{Xi,Y{^) 
and ^Edn{X2,Y2) on the right-hand side of (5.10.11) synchronously “vi- 
brate” as n increases. This case, however, never occurs if at least one of these 
two has a limit. The limit exists, for example, if at least one of Xi and X 2 is 
stationary ergodic. In such a case the inequality in (5.10.9) can be replaced 
with the equality. 

Corollary 5.10.1 (Han [35]). Suppose that at least one of sources Xi and 
X 2 is stationary ergodic and the distortion measure dn is subadditive. Then, 
for the mixed source X of Xi and X 2 it holds that 

Dfa(R\X) = aiDfa{R\X,) + a2Dfa{R\X2), (5.10.13) 

where we assume that the distortion measure dn satisfies the uniform inte- 
grability (5.3.4) source X. 

Proof. We first note that the distortion measure dn satisfies the uniform 
integrability for the sources Xi and X 2 from the assumption on the uniform 
integrability for X. We can assume that Xi is stationary and ergodic without 
loss of generality. Choosing an arbitrary Y satisfying /(X; Y) < R (that is, 
/(Xi; Yi) < R and /(X 2 ; Y 2 ) < R), Theorem 5.5.1 tells us that 

D{X,Y) = limsuplEd„(X”,y”) 

n—^oo R 

= limsup I ^Edn{X^,Yr) + ^Edn{X^,Y^n} 
n — ^c>o < n n ) 

> ai liminf 1 e4(Xi”,Yi") +a2limsuplE4(X?,y2") 

n >00 n n — >00 ri 

> ai\immf-Edn{X^,Y{^)+a2Dfa{R\X2). (5.10.14) 

n— >oo n 

In order to evaluate the first term on the right-hand side of (5.10.14), we 
recall the proof of the converse part of Theorem 5.5.1. Since it is clear that 
Y = (Y^ = 'ipn{Tn{X'^))) defined in the proof satisfies 



384 5 Rate-Distortion Theory 



limsup - log I {y € >’"|Py»(y) > 0} | < i? 

n-^oo 

from (5.5.27), Yi = defined above also satisfies 

limsup - log I {y G 3^"lPy"(y) > O} | < i?. 

n—*oo ^ 

Then, we can show that 7(Xi; Yi) < /(Xi; Yi) in the same way as the proof 
of Theorem 3.5.2, where we use Xi and Yi instead of X and Y, respectively. 
This allows us to take the infimum of the first term on the right-hand side of 
(5.10.14) with respect to Yi satisfying 

/(Xi;Yi)<7(Xi;Yi)<i?. 

Notice here that /(Xi; Yi) < R implies 

-I(X?; K") <R + e (Vn > no) 
n 

for an arbitrarily small e > 0. Now, define 
Dn{R)= inf -Edn{X^,Y{^) 

and note that 

Doo(R) = lim Dn{R) 

n—^oo 

exists because the source Xi is stationary and dn is the subadditive distortion 
measure (cf. Gray [31]). Then, (5.10.14) leads to 

D{X,Y) 

> lim inf inf -Edn{X^,YD + a 2 Dfa{R\X 2 ) 

„_oo y”:i/(X";y”)<fl+£ n 

= Q;iliminfD„(ii + £) + a 2 Dfa{R\X 2 ) 

n— »oo 

= OiiDoQ[R + £:) + CX2Dfa{R\^2)‘ 

Since Dqo {R) is convex with respect to R and hence continuous with respect 
to J9, by letting e — > 0 we have 

D(X,Y) > aiD^{R)-^a2Dfa{R\X2) 

= aiDfa{R\Xi) -h a2Dfa{R\X2), 

where we use Theorem 5.9.1 and the fact that Doo{R) is the inverse of Ioo{D) 
(see (5.9.6) in §5.9). Consequently, it follows that 

Dfa{R\X) > aiDfa{R\Xi) + a 2 Dfa{R\X 2 ), 

which, together with Theorem 5.10.2, yields (5.10.13). □ 



5.10 Rate-Distortion Function for Mixed Sources 385 

Example 5.10.2. Corollary 5.10.1 can be generalized under an adequate 
condition in the following way. Consider the ergodic decomposition of an 
arbitrary stationary source X = as is given by (5.10.6) in Exam- 

ple 5.10.1. Letting dn be a subadditive distortion measure and denoting by 
Dfa{R\^e) the /a-distortion-rate function for the stationary ergodic source 
X^, the /a-distortion-rate function Dfa{R\^) for the stationary source X is 
given by 

Dfa{R\X) = j DfaiR\Xg)dw{e) (5.10.15) 

(see Gray and Davisson [32] for the case that is additive). □ 



Next, let us consider the variable-length coding of the mixed source X 
under the maximum distortion criterion. 

Theorem 5.10.3. Let RymiDl'Ki) and Rym{D\'X. 2 ) he the vm-rate- distortion 
functions for sources Xi and X 2 ; respectively. Then, it holds that 

Rym{D\X) < aiRym{D\Xi) + a2Rvm{D\X2). (5.10.16) 

Proof Denote by Q the random variable satisfying Pq{1) = and Pq{2) = 
a 2 and define if Q = ^ for i = 1, 2. Then, it follows that 

n n n 

= y”|Q) + Q) - Q\Y^) 

n n n 

= l/(X«;y”|Q)+/x„, (5.10.17) 

where 

/i„ = l/(X";Q)-l/(X”;Q|y”). 

n n 

Note that ^ 0 as n — > 00 because Q takes only two values. By noticing 
lj(X";y”|Q) = ^/(Xi";ri") + (5.10.18) 

we have 

7(X;Y) = limsupl/(X";y") 

n—^oo 

= limsup (^I{X^-,Yn + ^7(X2”;F2")) 

n — >^00 \ n n / 

< ai limsup/(Xf ; Yi) + 02 limsup —I{X 2 \ Y^) 

n^oo n 

= ai7(Xi;Yi)+a2/(X2;Y2). 



(5.10.19) 



386 5 Rate-Distortion Theory 

On the other hand, (5.10.2) implies that D(X, Y) < D is equivalent to 

:D(Xi, Yi) < D and :D(X 2 , Y 2 ) < D. (5.10.20) 

Accordingly, in view of (5.10.19), the infimum of /(X; Y) with respect to Yi 
and Y 2 satisfying the constraint (5.10.20) is upper-bounded by the infimum 
of 

ai7(Xi;Yi)+a2/(X2;Y2) 

with respect to Yi and Y 2 meeting the same constraint. Since Yi and Y 2 
can be separately varied under (5.10.20), we finally obtain 

Rvm{D\X) 

< inf (ai/(Xi; Yi) +a 2 /(X 2 ; Y 2 )) 

Yi,Y2:(5.10.20) 

_ inf /(Xi;Yi)+a 2 _ inf /(X 2 ;Y 2 ) 

Yi:D(Xi,Yi)<D Y2:D(X2 ,Y2)<D 

= aiRvm{D\^l) + 0^2Rvm{D\^2) 

from Theorem 5.6.1, which establishes (5.10.16). □ 

As is mentioned when we treat the fixed-length coding under the aver- 
age distortion criterion (see Theorem 5.10.2), the inequality of (5.10.16) in 
Theorem 5.10.3 is due to the synchronous vibration of Xi and X 2 . One of 
the sufficient conditions making (5.10.16) hold with equality is given in the 
following corollary. 

Corollary 5.10.2 (Han [35]). Suppose that at least one of Xi and X 2 is 
stationary ergodic and dn is a suhadditive distortion measure. Then, for the 
mixed source X of Xi and X 2 it holds that 

Rvm{D\^) = 0^iRvrn{D\^l) + Ot2Rvm{D\^2) ^ (5.10.21) 

where, letting Xi = be stationary ergodic, we assume the 

existence of an r E y such that 

Edi{x[^\r) < + 00 . 

Proof. Without loss of generality, we can assume that Xi is stationary er- 
godic. For an arbitrary Y satisfying Z^(X, Y) < D (that is, D(Xi, Yi) < D 
and T)(X 2 , Y 2 ) < D), Theorem 5.6.1 tells us that 

7(X,Y) = limsupi7(X”;Y") 

n^oo 'a 

= limsup { ^7(Xi"; Yf) + Y^)] 

n — >■00 ^ n n J 

> ai liminf -I{X^\ Y{^) + a 2 limsup -7(XJ; Y^) 

Tl >00 Tl ^ fl 

> ai liminf -I{X^-, Y^) + a 2 Rvm{D\X 2 ). 

n-^oo n 



(5.10.22) 



5.10 Rate-Distortion Function for Mixed Sources 



387 



We recall here the proof of the converse part of Theorem 5.6.1 so as to 
evaluate the first term on the right-hand side of (5.10.22). Suppose that an 
arbitrary variable-length code satisfying (5.6.3) and (5.6.4) is given. 

We can construct another variable- length code using this code. 

We first note that the distortion measure dn satisfies the uniform integrability 
for the source Xi with the reference word = (r, r, • • • , r) G from 
the assumption of the corollary and Lemma 5.9.1. We define an encoder 
^ U* by ^„(x) = l</?„(x) if 

d(x,r(”)) > d„(x,V'n(<^n(x))) 

and = 0 if 

d(x,r("^) < d„(x, V’n(</^n(x))). 

Letting ^^(x) = u\i be the output of the encoder a decoder :U* ^ 
is defined by = '0n(u) u = 1 and 'ip^{uu) = if u = 0, where u 

is the leftmost symbol in the output. Then, we clearly have 

c^n(x,^„(^„(x))) < d„(x,r^"’) (Vx € AT”), (5.10.23) 

C^n(x,V’n(^n(x))) < C?n(x,V’n(<^n(x))) (Vx € A'”). (5.10.24) 

In addition, we have 

limsup — E|^^(X^)| < R (5.10.25) 

n—^oo 

from (5.6.3) and 

|^„(x)| < |</P„(x)| + 1 (VxeA^”). 

On the other hand, since the distortion measure satisfies the uniform 
integrability for the source Xi with the reference word (5.10.23) means 
that 

^dn{X^M<p^{X^))) (5.10.26) 

satisfies the uniform integrability. Thus, by setting Yi 
))), it follows from Lemma 5.3.2 that 

D(Xi,Yi) = limsuplEd„(Xi",F”) 

n— >oo ^ 

< p- limsup F;*) =:D(Xi,Yi). 

n—^oo 

Furthermore, in view^f (5.10.24), (5.6.4) and Lemma 1.4.2 in §1.4, setting 
Y = {F "}“=1 (F” = V’„(^„(X”))) implies that 



= 1 (f; 



(5.10.27) 



388 5 Rate-Distortion Theory 



D(Xi,Yi) = p-limsup-d„(X”,y") 

n—^oo 

< p-limsup — 

n— >cxD Tl 

< p-limsup -d„(X", V'n(i^n(Y"))) 

n—*oo ^ 

< ■£?• 

Hence, we obtain 

i)(Xi,Yi) <D(Xi,Yi) <D 
from (5.10.27) and (5.10.28). We note here that 

;D(X, Y) = p-limsupld„(X",F") < D 

n— >oo 

is included in (5.10.28). We also note that 
/(X;Y) < R 



(5.10.28) 

(5.10.29) 

(5.10.30) 

(5.10.31) 



is obtained from (5.10.25) similarly to the proof of the converse part of The- 
orem 5.6.1. Then, the combination of (5.10.29), (5.10.30) and (5.10.31) im- 
plies that the infimum in (5.6.2) in Theorem 5.6.1 with respect to Y and 
Yi can be restricted to Y and Yi satisfying (5.10.30) and (5.10.29), respec- 
tively. Hereafter, we simply write such Y = {Y and Yi = as 

Y = and Yi = respectively. Now, since D(Xi, Yi) < D 

from (5.10.29), for an arbitrarily small 6: > 0 it holds that 

-Edn{X?, Yi”) <D + e (Vn > no). 

n 



Setting 



1 



,{D)= inf -I{X^;Yr 

yi";.iEd„(X",y")<D n 



and noting that the limit Ioo{D) = lim I„{D) exists, it follows from (5.10.22) 

n—^oo 

that 

7(X;Y) 

> ai lim inf inf Yj”) + a 2 Rvm{D\X 2 ) 

n-^oo yj":lEd„(XJ‘,yi")<£)+£ n 

= cKi liminf /„(D + e) + a 2 Rvm.{D\X 2 ) 

n—^oo 

= ai/oo(T^ H- 6) -h a2Rvm{D\^2)' 



We recall here that /qo (D) is convex with respect to D and therefore contin- 
uous with respect to D. Thus, by letting £ ^ 0, we obtain 



/(X; Y) > aiI^{D) + a2Rym{D\X2) 

= aiRym{D\Xi) -f- a2Rvm{D\X2) , 



5.10 Rate-Distortion Function for Mixed Sources 



389 



where the equality follows from Theorem 5.9.1. Consequently, we have 
Rvm{D\^) > aiRym{D\l^i) + (^1X2). 

This, together with Theorem 5.10.3, yields (5.10.21). □ 



Example 5.10.3. Corollary 5.10.2 can be generalized in the following way 
under an adequate condition. For an arbitrarily given stationary source 
X = consider the ergodic decomposition of X in the form of the 

mixed source as is given in (5.10.6) in Example 5.10.1. Suppose that dn is 
a subadditive distortion measure and denote by Rym{D\'Ke) the um-rate- 
distortion function for the stationary ergodic source X^. Then, the vm-vdite- 
distortion function RymiDlX.) for the stationary source X is given by 

Rym{D\X) = j Rym{D\Xe)dw{9). (5.10.32) 

Finally, we consider the variable-length coding under the average distor- 
tion criterion for the mixed source X of Xi and X 2 . 

Theorem 5.10.4. Suppose that a distortion measure dn satisfies the uni- 
form integrability (5.3.4) for the mixed source X. Denote by Rya{D\'Ki) 
and Rya{D\'K 2 ) the va-rate- distortion functions for the sources Xi and X 2 ; 
respectively. Then, it holds that 

Rva{D\^)^ illf (o^lRvai^ll^l) Oi2Rva{D2\^2)) j 

{Di , JO 2 ) :o:i -f-0!2 D 2 

(5.10.33) 

where the infimum on the right-hand side of (5.10.33) is taken with respect 
to all {Di,D 2 ) satisfying aiDi + 02^)2 < D- 

Proof. It easily follows that 

7(X;Y) = limsupl/(X”;r") 

n—^oo n 

= limsup f ^I{X^,YD + 

n — >00 < n n ) 

< a\ limsup Yi) + 02 limsup — /(X2 , Y2) 

n — >00 n n — ^cxD n 

= ai7(Xi; Yi) + a2/(X2; Y 2 ). (5.10.34) 

On the other hand, we have 

£)(X, Y) = limsuplEd„(X",Y”) 

n— >00 ri 

= limsup I ^Ed„(xr, Yi”) + ^Ed„(X 2 ”, Y 2 ")| 

n — >00 ^ n n j 

< ai limsup lEd„(X", Y") + 02 limsup —EdniXo, Yo) 

n — >^00 n n — >^00 n 

= aiD(Xi,Yi)+a2£>(X2,Y2). 



(5.10.35) 



390 5 Rate-Distortion Theory 



Therefore, if D(Xi, Yi) < D\ and D(X 2 , Y 2 ) < D 2 for an arbitrary (^ 1 ,^ 2 ) 
satisfying a\Di -\-a 2 D 2 < (5.10.35) tells us that T)(X, Y) < D. Since the 

assumption that dn satisfies the uniform integrability for X implies the uni- 
form integrability for Xi and X 2 , it follows from Theorem 5.7.1 and (5.10.34) 
that 



Rva{D\X) 



= inf 
Y:D(X,Y)<D 



7(X;Y) 



< ai inf 

Yi:D(Xi,Yi)<L>i 



7(Xi;Yi)+a2 



inf 

Y2:D(X2,Y2)<D2 



/(X2;Y2) 



— aiRya{Dl\^l) + OL2Rva{R2\^2) ‘ 



(5.10.36) 



By taking the infimum on the right-hand side of (5.10.36) with respect to all 
{Di, D 2 ) satisfying a\Di + a 2 D 2 < D, we obtain (5.10.33). □ 



The following corollary gives a sufficient condition that (5.10.33) holds 
with equality. 

Corollary 5.10.3. If both Xi and X 2 are stationary and dn is subadditive, 
it holds that 

Rya{D\X) = inf {aiRya{Di\Xi) + a 2 Rva{D 2 \X 2 )) 

(Di^D2)-Oil Di-\-OC2D2'^D 

(5.10.37) 

for the mixed source X of X\ and X 2 , where, denoting the mixed source 
by X = {Xi, X2r ' ')} assume the existence of an r G y satisfying 
Edi{Xi,r) < + 00 . 

Remark 5.10.2. The formula (5.10.37) is equivalently expressed in a form 
of the distortion-rate function as 

Dya{R\X) = inf {aiDya{Ri\X\) + a2Dya{R2\X.2)) • 

(i?l ,R2)-Oii Ri~\-Oi2 i?2 

Proof of Corollary 5.10.3. 

Since 

£)(X,Y) = limsupl7„(X”,Y”) 

n-^oo R 

= limsup I Yj”) + ^Edn{X^,YM , 

n — >^00 ^77/ n J 

for an arbitrarily small 7 > 0 we have 

^Edn{X^,Y{^) + ^Edn{X^,Y^) < D(X, Y) +7 (Vn > no). (5.10.38) 
On the other hand, since (5.10.17) and (5.10.18) yield 



5.10 Rate-Distortion Function for Mixed Sources 



391 



7(X; Y) = limsupl/(X";Y”) 

n—^oo ^ 

= limsup rn + ^7(X2^ Y 2 ”)} , 

n — ^cxD V il J 

we obtain 

^7(Xi", YD + —I{X^, Y 2 ”) < I(X; Y) + 7 (Vn > no). 

Tl Tl 

Setting 

77<") = lE7„(Xr, yr), = lEd„(X2«, Y2”), 

(5.10.38) can be written as 

< D(K, Y) + 7 (Vn > no). 

Then, (5.10.39) leads to 

ai_ inf_ l7(Xr;yi”) 

+ «2 inf ^ ^ l7(X2”; ^ 2 ") < /(X; Y) + j. 



Setting 






inf 






^)<D n 



iPiD) 



inf 



Y^^-.iEd,.iX’^\Y^^)<D n 



■I{X^-,Y^n 



(5.10.39) 



(5.10.40) 



(5.10.41) 



for simplicity, (5.10.41) can be written as 

ai7W(7?(")) + < 7(X; Y) + 7 . 

Then, in view of (5.10.40) it holds that 

inf (ai7«(Di) + 027,^2) (£, X < /(X; Y) + 7 . 

(5.10.42) 

We use here the properties that In\Di) and In\D 2 ) have the limit 
7W(7?i) = hm 7«(71i), 

n— too 

7^)(T?2) = lim 7 ( 2 ) ( 7 ) 2 ) 

n— too 

as n — > 00 and satisfy 

7W(7?i) > 7«(71i) (Vn = l,2,...), 

I^^HD 2 ) > I^^HD 2 ) (Vn = l,2,...), 



392 5 Rate-Distortion Theory 



which follow from the stationary of Xi and X 2 and the subadditivity of dn 
(cf. Gray [31]). By using these properties, we obtain 

inf (ai/Wpi) < /(X; Y) +7 

(5.10.43) 

from (5.10.42). Defining 

9(D) = inf (aiJ«(Z)i) + a 2 /^)(D 2 )), (5.10.44) 

{Di,D2)‘OilDi-}-Ol.2D2'^D \ / 

(5.10.43) can be expressed as 

5(Z1(X,Y)+7) </(X;Y) + 7 . (5.10.45) 

Notice here that g{D) is convex, and hence, continuous as a function of D, 
which can be easily verified from the convexity of and I^\D 2 ). By 

virtue of this continuity, we can obtain ^(D(X, Y)) < /(X;Y),i.e., 



inf 

(Di,D2):aiDi+a2i^2<i:>(X,Y) 



(aJ^J}{D,)+a2l^^\D2)) < /(X; Y)(5.10.46) 



by letting 7 — > 0 in (5.10.45). Here, we also notice that, since the assumption 
Edi(Xi,r) < +00 of the theorem can be written as 

aiEdi{x[^\r) 4- a 2 Edi{x[^\r) < + 00 , 
where 



X 2 = (xf\xf,...), 



Xi = (Xl'\X^'\-..), 

it must hold that 

Edi (Xp^ , r) < +00, Edi (Xf ^ , r) < +00. 

Therefore, in view of Lemma 5.9.1 the distortion measure dn satisfies the 
uniform integrability for both of Xi and X 2 . Since we have 



i?,„(£)i|Xi) = 7«(Ili), 

Rva{D2\X2) = I^^\D2) 



from Theorem 5.7.1 and Remark 5.7.3, (5.10.46) can be expressed as 

.n n ^ n , ^ l^l) + a2i?.a(I>2|X2)) < /(X; Y). 

(Di ,D2 ) :q:i Di + 0:2 1L>2 < i^(X, Y) 



Now we can conclude 



Rva{D\X)> inf (aii?„a(£)i|Xi) + a 2 i?va(r> 2 |X 2 )) 

{Di,D2)‘CtiDi -\-0C2D2'^D 

because we have 



Rya{D\X)= inf /(X;Y). 

from Theorem 5.7.1 again. This, together with (5.10.33) in Theorem 5.10.4, 
yields (5.10.37). □ 



5.10 Rate-Distortion Function for Mixed Sources 



393 



Example 5.10.4. Corollary 5.10.3 can be generalized under an adequate 
condition in the following way. Consider the ergodic decomposition of an 
arbitrary stationary source X = is given in (5.10.6) in Exam- 

ple 5.10.1. Suppose that dn is a subadditive distortion measure and denote by 
Rya{D\'Xo) the t^a-rate-distortion function for the stationary ergodic source 
X^. Then, the ua-rate-distortion function Rya{D\X.) for the stationary source 
X is given by 



Rya{D\X)= lim inf 

= inf [ Rya{D 0 \Xe)dw{ 6 ), 

{De}: f Dodw(e)<DJ 



(5.10.47) 



where the right-hand side represents the infimum with respect to all {De}ee^ 
satisfying / Dedw{0) < D (see Leon-Garcia, Davisson and Neuhoff [61] for 
the case that dn is the additive distortion measure and both of a source 
alphabet A’ and a reproduction alphabet y are finite sets). This can also be 
expressed in terms of the distortion-rate function as 



DUR\^) = 



lim inf 

n-^oo n 



inf [ Dya{Re\X 0 )dw{O) 

f Redw{e)<RJ 



(5.10.48) 



(see Shields, Neuhoff, Davisson and Ledrappier [82] for the case that dn is 
the additive distortion measure (Hamming distortion measure) and both of 
X and 3^ are finite sets). □ 



Remark 5.10.3. In Corollary 5.10.2 if we consider the special case that both 
Xi and X 2 are stationary ergodic. Theorem 5.9.1 implies that 

F^vm{D\X) — aiRfa{D\Xi) 0'2Rfa{D\X2). 

However, this does not always coincide with the rate-distortion function 
Rfa{D\X) corresponding to (5.10.13) in Corollary 5.10.1. This means that 
Rym{D\X) is different from Rfa{D\X) for the mixed sources. In addition. 
Theorem 5.10.1 tells us that Rfm{D\X) is different from these two rate- 
distortion functions. Furthermore, the formula (5.10.37) in Corollary 5.10.3 
gives a rate-distortion function different from these three. Summarizing, the 
four kinds of rate-distortion functions, Rfm{D\X), Rfa{D\X), Rym{D\X) 
and Rya{D\X), are generally different except for the special case that X is 
stationary ergodic (Theorem 5.9.1). □ 



6 Identification Code and Channel 
Resolvability 



6.1 Identification Code and Channel Resolvability 

In the channel coding problems treated in Chapter 3 we considered an 
(n, Mn, 6n)-code that is usually called the transmission code. It was Shan- 
non [77] who, in 1948, first formulated the channel coding problems based 
on the notion of the transmission code. After Shannon’s study, it was widely 
accepted that a code for a channel means the transmission code. 

However, the code proposed by Ahlswede and Dueck [4] in 1989, which 
is called the identification code., is based on a different notion. That is, the 
identification code is based on a new notion such that we realize many hy- 
pothesis testing problems over a single channel. Ahlswede and Dueck ob- 
tained the surprising result that the maximum number of messages that can 
be transmitted through a channel by using the identification code becomes a 
double-exponential function of block length n, while the maximum number 
of messages is an exponential function of block length n if we use the trans- 
mission code (Theorem 3.2.1). On the other hand, Han and Verdii considered 
an approximation problem of the output distribution of a channel and de- 
fined the quantity called the channel resolvability as a dual counterpart of the 
ordinary channel capacity. Han and Verdu unveiled a deep relationship be- 
tween the identification coding problem and the channel resolvability problem 
through their study. 

It was obvious that both the identification coding problem and the channel 
resolvability problem lie one step outside of the framework of classical infor- 
mation theory. However, we should consider that the true meaning of the two 
problems consists in the fact that we really need a new information theory 
treating nonstationary and nonergodic sources or channels for essential solu- 
tion of the problems. We really need the new theory because basic properties 
due to the stationarity and ergodicity such as the weak law of large numbers 
and the asymptotic equipartition property are no longer available. In this 
book we revisit all the area treated in the classical information theory from 
a viewpoint of the nonstationary and nonergodic information theory and at- 
tempt to reconstruct them based on the notion of the information-spectrum. 
(In this sense this book can be regarded as a book of “nonstationary and 
nonergodic theory.”) 



396 6 Identification Code and Channel Resolvability 



In this chapter we deal with the identification coding problem and the 
channel resolvability problem, both of which contain the meaning explained 
above, and describe fundamental results on them. 



6.2 Identification Coding 

Let W = be an arbitrary general channel. Denote by X and 3^ 

the input alphabet and the output alphabet of W, respectively, where X 
and 3^ are not necessarily finite sets. The identification code for the channel 
W is defined in the following way. First, let Nn = {1, 2 , • • • , Nn} be a set 
of messages to be transmitted. Denote by V{X'^) the set of all probability 
distributions over X'^. A transmitter prepares probability distributions 
QuQ2r ’ ’ ^ QNr, C V{X'^). If the transmitter wants to send a message i G 
an encoder (pn generates an input sequence € X'^ randomly subject to the 
probability distribution Qi, i.e., Qi = (This kind of the encoder <pri is 

called a stochastic encoder (see §3.8). If Qi is the one-point distribution, is 
reduced to an ordinary encoder defined in Chapter 3.) Here, the probability 
distribution Qi — Pn{i) is called the codeword of the message i and Cn = 
{Qi,Q2r ” iQ Nr,} ^ which is a set of probability distributions, the code. On 
the other hand, at the decoder side an A^^-tuple of decoders 

V’n = 

is prepared. For each i = 1,2, - ■ ■ ,Nn the decoder ipn^ judges that i € Mi is 
transmitted if a channel output y satisfies y G A, where 

( 6 . 2 . 1 ) 

are Nn subsets of chosen in advance. The decoder judges that a 
message different from i G Mn is transmitted if y ^ 73^. Here, Vi is called the 
decoding region of the message i. It is not required that V\^V 2 r • ' ^ '^Nr, 
disjoint. In the identification coding problem, however, each decoder is 
only interested in transmission of the corresponding message i. That is, when 
judges that a message different from i is transmitted, is no longer 
interested in the transmitted message. At this point the identification code 
is different from the transmission code defined in Chapter 3. We now define 
the coding rate of the identification code by 

rn = - log log iVn. (6.2.2) 

n 

Note that we take the double-logarithm in the definition of the coding rate. 

Remark 6.2.1. The coding rate defined in (6.2.2) means that we are con- 
sidering the situation where the size of codewords Nn asymptotically grows 
as a double-exponential function of block length n, i.e.. 



6.2 Identification Coding 



397 



iVn^e^ 



for some constant R> 0. 



□ 



Next, for an arbitrary probability distribution Q G over we 

denote by the probability distribution over defined by 

QW^iy) = Q(x)W^"(y|x) (Vy e y") (6.2.3) 

and set 

= QiW^m (i = 1, 2, • . . , Nn), (6.2.4) 

=QjW-{Vi) (6.2.5) 

Here, /in^ means the probability that the decoder misjudges that i is 
not transmitted when i is actually transmitted and An the probability that 
misjudges that i is transmitted when a message j {j ^ i) is actually 
transmitted. We define 

Mn = 

l<l<Nri 

An = max 

and call the above the error probability of the first kind and the error proba- 
bility of the second kind of the identification code, respectively. 

Remark 6.2.2. The assumption that the decoding regions Pi, P 2 ? • * ’ ? 
are not necessarily disjoint is essential in the identification code. In fact, if we 
require that Pi,P2j * • • disjoint, the identification code is reduced 

to the transmission code treated in Chapter 3. Since under such a require- 
ment the error probabilities always satisfy < fin^ {i ^ j) and therefore 
it holds that An < /^n, we have only to consider pin as a measure of coding 
performance, where pin corresponds to the maximum error probability Sn de- 
fined in §3.4. □ 



( 6 . 2 . 6 ) 

(6.2.7) 



Remark 6 . 2 . 3 . Set Hn = {1, 2, • • ■ , Nn}. For each z = 1, 2, • • • , Nn let Hi be 
the hypothesis testing problem with {i} as the null hypothesis and Hn — {0 
as the compound alternative hypothesis. The identification code can be re- 
garded as a problem in which we simultaneously test these Nn hypothesis 
testing problems Hi (i = 1, 2, • • • , Nn) via a channel W. The error probabil- 
ities pin and An defined by (6.2.6) and (6.2.7) give upper-bounds on the error 
probabilities of the first kind and the second kind of these hypothesis testing 
problems, respectively. □ 



398 6 Identification Code and Channel Resolvability 

Example 6.2.1 (Ahlswede and Dueck [4]). We give the following example 
so that we can see the meaning of the identification code. Suppose that Nn 
sailors Si^ 82 ^ ’ ' •> go on a voyage. Each sailor Si has a wife Li who is 
waiting for the return of Si at a harbor. Now, consider the situation such that 
one day the weather, which has been fine so far, suddenly turns stormy and 
one of the sailors, say 5^, is drowned. The captain wants to let the sailor’s 
wife Li know about the tragic event by radio. Of course, the captain need 
not let the other wives Lj {j ^ i) know that it is Si who is drowned. How- 
ever, the captain wants to inform them that their respective husbands are not 
drowned. Such a way of communication, which exactly explains the notion 
of the identification code, is enough because for each k = 1, 2, • • • , wife 
Lk is only interested in the news whether her husband 5^; is drowned or not. □ 



We call a pair of an encoder and a decoder with a code of size 

Nn and the error probabilities of the first kind fin and second kind the 
(tT/, Mn? A7T,)“Code. 

The identification coding problem is usually formulated as the maximiza- 
tion of the coding rate subject to the condition that there exists a pair 
of an encoder and a decoder causing the error probabilities /in and 
An less than given values. We require that the error probabilities /in and An 
satisfy 

limsup /in < /^, 

n—*oo 

limsup An < A 

n —^00 

for fixed constants 0 < /i < 1 and 0 < A < 1, which is the most standard 
requirement on the error probabilities. The identification coding problem is 
formulated under this requirement in the following way: 

Definition 6.2.1. 

Rate R is (/i, A)-achievable 

<4^ There exists an (n, An? An)-code satisfying limsup /in < /i 

n—*oo 

lim sup An < A and lim inf — log log Nn > R. 

n — >00 ^ ^ 

Definition 6.2.2 ((/i, A)-Identification capacity). 

D{ii^ A| W) = sup I jR is (/i, A)-achievable} . 

Then, we have the following fundamental theorem describing a relation- 
ship between the identification capacity and the channel capacity in the sense 
of the transmission code. 



( 6 . 2 . 8 ) 

(6.2.9) 



6.2 Identification Coding 



399 



Theorem 6.2.1 (Direct theorem: Ahlswede and Dueck [4]). Let W be 
an arbitrary channel. If 0 < e < /a and 0 < £ < \, it holds that 

D(/i,A|W) >C{e\W), (6.2.10) 

where C(£:|W) denotes the e-channel capacity defined in ^3.4- □ 



We use the following lemma for proving Theorem 6.2.1. 

Lemma 6.2.1 (Ahlswede and Dueck [4]). Let M. be an arbitrary finite set 
of size M = |A4|. Choose constants r and k, satisfying 0 <t<-,0<k<1 

O 

and 

Klog (^1 - 1^ > log2 + l, (6.2.11) 

where we use the natural logarithms. Define 



N = 



grM 



( 6 . 2 . 12 ) 



Then, there exists N subsets Ai,A2,---,AnCA 4 satisfying 

\Ai\ = [rM\ (i = l,2,---,N), (6.2.13) 

\AiD Aj\ < k[tM\ (6.2.14) 



Proof. Set M' = [rMJ for simplicity. We first choose an arbitrary Ai C A4 
satisfying |>li | = M'. Let us evaluate the number of subsets A of A4 satisfying 
1^1 = M' and 

|^inyl|>«M'. (6.2.15) 

The number St,k of such subsets A C Ai can be enumerated as 



M' 

Sr,.= E 



\M' -i ) 




(6.2.16) 



Since 0 < r < — is assumed, we have M > 3M' > ZM' — 2[kM']. Hence, it 
o 

holds that 



M -M'>2{M' - 
which leads to 

(M-M'\ ( M-M' \ ( M \ 

\M' -i )- \M' - \kM'] J -\M' - \kM'] ) • 

for i = \kM'] , \kM'] + 1, • • • , M'. In addition, we trivially have 



400 6 Identification Code and Channel Resolvability 



Therefore, (6.2.16) can be upper-bounded in the following form: 



Sr,K <M' [ 1 2 " 



(6.2.17) 



On the other hand, the number of subsets of Ai satisfying |^| = M' is 



M 



M 
M' 



there exists a subset A 2 of A4 



equal to j- Accordingly, if To < 
satisfying IA 2 I = M' and 

\Ai n A2I < kM'. 

Furthermore, since the number of subsets A C Ai satisfying IA 2 n A| > kM' 
and |A| = M' is also at most To, there exists a subset A 3 of M satisfying 
I A 3 1 = M' and 

|AinA 3 |<KM', 1^20^3! <kM' 
provided that 2To < ( j . This argument shows that, if N*Tq < 



M 

M' 



for a positive integer N*, there exists N* subsets Aid M. (i = 1, 2 , • ■ • , N*) 
satisfying |Aj| = M' and 

\AiC\Aj\<KM' 

Consequently, N* can be expressed as 



N* = 



M 

M' 



(6.2.18) 



By substituting To in (6.2.17) into the right-hand side of (6.2.18), we have 



N* = 



M 

M' 



M'2^' 



M 

M' - \kM'^ 



which can be written as 
TkM'1 



N* 



1 






M-M' + i 



=1 



n 



(6.2.19) 



We note here that the assumption of 0 < r < - implies M — M' > M' 
[ kM'] . This guarantees that 
M-M' + i 



M' - (kM'] +i 



6.2 Identification Coding 



401 



is monotone decreasing with respect to i. Thus, for i = 1 , 2 , • • • , [kM'] it 
follows that 

M + i ^ [AvMn ^ M - [rMJ 



M' - [kM'] -}- i - M' 

^ M — tM 1 ^ 

“ tM t 

Substitution of this into (6.2.19) yields 
1 



[tMJ 



N* > 



M'2^‘ \t ^ 

M'(/clog(^-l)-log 2 ) 

M' 



\kM'~\ 


> 


1 /I 


) 




M'2M' L J 



By noticing that 0 < r < i and therefore 0 < r < 1, we obtain 



drMJ 



* 

IV 


J_ M' 




1 


_M' 




_\tM\ 


> 


grM -1 




grM 


M 




Me 



by virtue of ( 6 . 2 . 11 ). Hence, we can set N = 






Me 



□ 



Proof of Theorem 6.2.1. 

Let 7 > 0 be an arbitrarily small constant and set R = C(£|W) — 7 . 
It suffices to prove that R is (/i, A)-achievable as a rate of the identification 
code. First, from the Definition 3.4.2 in §3.4, R is ^-achievable as a rate of 
the transmission code. Therefore, there exists an (n, M^, Sn)-code satisfying 



lim sup £n ^ ^ and 

n— +00 



lim inf — log > R, 

n— »oo Tl 



(6.2.20) 



where Sn denotes the maximum error probability (see Remark 3.4.2 in Chap- 
ter 3). Denote by Cn = {ui, U 2 , • • • , um„} (u^ G A'’^) the code of the 
{n, Mn,£n)-code and let Ui be the decoding region corresponding to for 
z = 1 , 2 , ■ • • , Mn- Setting A4 = {1, 2 , • • • , M^}, M = Mn^ t = r-a and n = 
Lemma 6.2.1 tells us that there exist Nn subsets of A4 sat- 

isfying 



\Aj\ [TnMn\ {j ^ 1,2,---,AT^), 

\Aj n Aj^\ <! hvYi\T>j2Mrp^\ (j ^ /c), 

where we set 

- ^ _ 2 
n + 3 ’ log(n + 2) ’ 

gr.M. 



( 6 . 2 . 21 ) 



( 6 . 2 . 22 ) 



402 6 Identification Code and Channel Resolvability 



Note that the condition (6.2.11) in Lemma 6.2.1 is still satisfied if we choose 
r and k in (6.2.21) dependent on n. Now, we define the subsets Sj {j = 
l, 2 ,---,Nn) ofC„ by 

= U 

i^Aj 

and denote by Qj the uniform distribution over Sj . Let us consider the iden- 
tification code Cn with the decoding region 

Vj =\JUi 

i^Aj 

corresponding to Qj. Then, the error probability of the first kind can be 
evaluated as follows: 









iE.Aj 






M' 

^ ieA 



where M'^ = and the first inequality follows because Ui C Vj if 

i £ Aj. Therefore, we obtain 



< e„. 

1 <J <Nn 



(6.2.23) 



On the other hand, the error probability of the second kind can be evaluated 
for j 7 ^ A: in the following manner: 

= QjW^(Vk) 



iEAn 






iE.Ajf^A}z 



M' 

^ ieAjC\Al 






iEAjC^Ai^ 



iEAj DA^ 



~ M' ^^1 + T^f \^j ^k\^n 

iVJ-n 

< /^n T 5 

where the last inequality follows from \ AjC\Ak\ < and < M^. 

Therefore, we obtain 



An = max < /^n + 

l<j^k<Nn 



(6.2.24) 

Then, due to (6.2.20), (6.2.23) and the assumption of e < /x it follows that 



6.2 Identification Coding 



403 



limsup/in < /i- (6.2.25) 

n— >oo 

In addition, due to (6.2.20), (6.2.24) and lim = 0 it follows that 

n—*oo 

lim sup An < 6 :, 

n— >co 

which, together with the assumption of s < X, leads to 



lim sup An < A. 

n-^oo 

On the other hand, ( 6 . 2 . 22 ) implies that 
iVn> 



(6.2.26) 



2MnC 

for all sufficiently large n. By recalling that Tn 



1 



77/ “h 3 



and noticing that 



( 6 . 2 . 20 ) guarantees that (Vt 7 > tiq) for an arbitrarily small 

constant 7 > 0 , it holds that 

log A^n > TnMn - log(2MnC) > — (VtI > 72o), 

which yields 

lim inf — log log Nn > lim inf — log Mn- 

n-^00 n n-^00 Ti 

Consequently, we obtain 

lim inf — loglog A^n > R (6.2.27) 

n^oo 71 

from (6.2.20). This completes the proof of the theorem because (6.2.25)- 
(6.2.27) mean that R is (/x, A)-achievable as a rate of the identification code. 

□ 



Remark 6.2.4. In the identification coding problem we assume that not 
only the decoding regions Pi, P 2 , * * • 5 necessarily disjoint (Re- 

mark 6.2.2) but also the codewords are defined as the probability distribu- 
tions Qi,Q2r “ iQNr, over The identification code is essentially different 
from the transmission code at these two points. If we define, similarly to the 
transmission code in Chapter 3, input sequences ui, U 2 , • • • , G as the 
codewords instead of the probability distributions, the number of different 
codewords is at most |A:'|^ for the case that A* is a finite set. This means 
that the size Nn of the codewords cannot grow as a double-exponential func- 
tion of the block length n. That is, the identification coding system is one of 
the typical coding systems that enable us to drastically increase the coding 
rate by permitting a stochastic encoder. On the other hand, in the ordinary 
transmission coding system described in Chapter 3 we cannot increase the 



404 6 Identification Code and Channel Resolvability 



achievable coding rate even if we permit the stochastic encoder. In addition, 
the probability distributions Qi, Q25 • * * 5 QiVr.: which are constructed based 
on Lemma 6.2.1 in the proof of Theorem 6.2.1, are essentially nonstationary 
and nonergodic. Hence, the identification coding system is completely differ- 
ent from the ordinary transmission coding with background of stationarity 
and ergo dicity such as the weak law of large numbers and the typical se- 
quences. □ 



Since Theorem 6.2.1 gives the direct theorem on the identification cod- 
ing problem, we need to describe the converse theorem next. However, the 
converse theorem is essentially related to the channel resolvability problem, 
which is an extension of the resolvability problem for sources (§2.2) to the 
approximation problem of the output distributions of channels. Therefore, 
we first explain the channel resolvability problem in the following section. 
The converse theorem on the identification coding problem will be shown 
subsequently using the result on the channel resolvability problem. 



6.3 Channel Resolvability 

In §2.2 we considered the resolvability problem in which we transform the 
uniform random number Um^ of size Mn approximating another random 
number Y = tho sense of the vanishing variational distance. 

Since the distribution of the random number obtained by transforming the 
uniform random number becomes the Mn-type (see §2.2), the resolv- 
ability problem is nothing but the problem in which we approximate a given 
random number Y = by a random number of the M^-type. 

The resolvability problem can be extended to the approximation problem 
of the output distribution of a channel in the following way. First, suppose 
that a general channel W = with arbitrary input and output 

alphabets Y and T, respectively, is arbitrarily given. Let Y = 
be the output from the channel W corresponding to a given input X = 
We transform the uniform random number of size Mn into 
another input X = That is, 

X^ = ^n{UMj. (ZYm. ={1,2,...,M,}). 

We consider the problem of how, denoting by Y = the output from 

the channel W with an input X, we can choose the size Mn of the uniform 
random number the transform cpn such that the variational distance 

between Y = and Y = satisfies 

lim d(Y^,Y^) = 0. (6.3.1) 

n— >00 

We would like to realize this approximation of the output distribution by 
using the uniform random number U Mr, of the smallest possible size. In other 



6.3 Channel Resolvability 



405 



words, we can regard this approximation problem as the problem that we 
adequately choose an input X'^ of the M^-type such that the channel output 
Y'^ satisfies (6.3.1). This problem is called the channel resolvability problem. 
This criterion of approximation (6.3.1) can be slightly generalized to 

limsupd(y^,y^) <5, (6.3.2) 

n-^oo 

where 5 is an arbitrary constant satisfying 0 < 5 < 2. Under this generalized 
approximation criterion (6.3.2) we formulate the problem in the following 
way. We note here that the smallest possible size of the uniform random 
number depends on a given input X = 

Definition 6.3.1. 

Rate R is 5- achievable for an input X = 

4^ There exists a transform = ^n{UMr,) satisfying 
limsupd(y’^,y’^) < 5 and limsup — logM^ < R, 

n— >oo n— >cx) 

where Y^ and Y'^ denote the channel outputs corresponding to and X^, 
respectively. 

Definition 6.3.2 (Channel 5-resolvability for an input X). 

5x(5|W) — inf {R\ Ris 5-achievable for an input X} . 

Remark 6.3.1. If the channel W is identical, the channel 5-resolvability 
6'x(5|W) coincides with the 5-resolvability 5^(5|X) defined in §2.4. □ 

We are also interested in the smallest possible size of the uniform 
random number that enables us to satisfy the criterion (6.3.2) for any 
input X under an adequate choice of the transform X^ = (fn{UMr^)• We can 
formulate this problem in the following way: 

Definition 6.3.3. 

Rate R is 5-achievable R is 5-achievable for all inputs X. 



Definition 6.3.4 (Channel 5-resolvability). 

S'(5|W) = inf {R\ R is 5-achievable} . 

Remark 6.3.2. Clearly, 5(5|W) = sup5x(5|W). In addition, 5(5|W) is a 

X 

right-continuous and monotone decreasing function of 5. □ 

We give the following definition for the case that we are interested in the 
case of 5 = 0 in particular. The quantity below can be regarded as a dual 
counterpart of the channel capacity C(W) defined in §3.1. 



406 6 Identification Code and Channel Resolvability 



Definition 6.3.5 (Channel resolvability). 

^(W) 5(0|W). 

We have the following theorem under these definitions. 

Theorem 6.3.1 (Direct theorem: Han and Verdu [46]). Let W be an 

arbitrary channel and X an arbitrary input Then, it holds that 

^x(<5|W) <7(X;Y) (6.3.3) 

for all S > 0, where Y denotes the channel output corresponding to the input 
X and /(X; Y) represents the spectral sup-mutual information rate defined 
in Definition 3.5.2. 



Proof. Let 7 > 0 be an arbitrarily small constant. Since 5x(^|W) < 
S'x(0|W) (V(5 > 0) from the definition, it suffices to prove that R = 
/(X; Y) -h 7 is 0-achievable {5 = 0) for an arbitrary input X = To 

this end, setting 

Mn = e”^, (6.3.4) 

we develop the existence of an input X = of the M^-type satisfying 

lim d(Y^,y^) =0, where and Y'^ denote the channel outputs corre- 

n— ^00 

sponding to X^ and X^, respectively. In order to develop the existence of 
such X^, we use the following random coding argument. First, we assign a 
probability distribution over 

~ ~ 1? 2, • • • , il^n) (6.3.5) 

to an arbitrary subset Cn = {ci, C2, • • • , } C of size (ci, C2 , • • • , cm„, 

are not necessarily distinct). Clearly, this gives an M^-type distribution (if 

m 

Cn contains m c^’s, we regard this probability as t^)- Let Xf , X2 , • • • , X]^^ 

denote Ci, C2, • • • , generated subject to the probability distribution Px^ . 
Then, the output distribution Y^[Xf,X2 , • • • channel output 

with P[x]^-^x^,---,x^ ] the input distribution is given by 






. Mn 



7=1 



Hence, if 

lim Ed(F”, XJ, • • • , X^J) = 0 



(6.3.6) 



(6.3.7) 



is established, where E denotes the expectation with respect to X”, XJ, • • ■ , 
X£f , there must exist at least one realization Xf = ci, X2 = C2, • • • , = 

cm„ satisfying 



6.3 Channel Resolvability 



407 



lim cf(y", y"[ci,C 2 , ••• ,CAf„]) = 0. (6.3.8) 

n—*oo 

Equation (6.3.8) guarantees the theorem because y’^[ci,C 2 , • • • ,cm^] is the 
channel output corresponding to the Mn-type distribution ^[ci,c 2 , --,cm„]* 
Hereafter, we prove (6.3.7). We need the following lemma for the proof. 

Lemma 6.3.1. For all r > 0 it holds that 

> rj , (6.3.9) 

where and Pz are probability distributions over Z. 



d{Pz,Pz) < 2 t + 2Pr jlog 



Proof. First, the variational distance d{Pz^ P^) can be written as 

Pzi^y 

^ z\^) ~ ^ ^ ^ -j 

z^Z L 



d(Pz,P£) = Y,{Pz{z)-Pz{z))\ 



Pz{z)\ 



Y,{Pz{z)-Pzim 



zez 



2Y,{Pz{z) - Pz{z))\ 

z^Z 



log^<0 

Pz[z) 

Pzi^) 



0 < log 



Pz{z)\ 



Setting 

di = y2(Pzi^)-Pz(^)n 



zez 



zEZ 

we have 



. Pzi^) 






d{Pz, Pz) = 2di H- 2(^2. 



The claim of this lemma follows because 



d\ < Pz{z)l 
zez 



\ Pzi^) 

log vT/ X > T 



Pz{z) 



PrjlogaS 

1 Pz(Z) 



^ > r , 






zez 



0 < log < r 

Pz[z) J 






0 < log < T 

Pz{z) J 



^ E Pzi^y = p 

zez 



(6.3.10) 



408 6 Identification Code and Channel Resolvability 



where we use log — > 1 — x (x > 0) to obtain the second inequality. □ 



Let us return to the proof of (6.3.7). Lemma 6.3.1 tells us that 
in (6.3.7) can be upper-bounded as 

Ed{Y^,Y’^[X^,X^, ■ ■ ■,X2iJ) <2 t + 2A„ (6.3.11) 

for an arbitrarily small r > 0, where 



= E Pr < log 






(Y-[X^,X^,---,X]tfJ) 



Frn(Y»[X^,X^,---,X]^J) 



> T 



(6.3.12) 



Thus, (6.3.7) follows by establishing > 0 as n ^ oo. Hereafter, we estab- 
lish this. First, we evaluate An in the following way: 



An — 



E - E Px"(ci)---Px"(cm„) E (y) 

Pv 



X 1 



log , > r 



Pyr.{y) 



Mr^ 



Mr 



X 1 



E E ••• E i"x^(ci)---Px^(cMj E 

j=ici€A'^ CMr^eY^ yey^ 

Tyn[ci c2,---,CMr,.] . 

= E ■■■ E Px^{c2) ■ ■ ■ Px^{cm„) ^ ^ Px’*y"(ci,y) 

y^y^ 



^exp[ix"oH^"(ci;y)] + ^^exp[ix"oM^"(cj;y)] > 1 + 2p 

n n j _2 

Pr|^exp[ixnovv™(V";r")] > p| 

+ Pr > 1 + p| > 



< 



(6.3.13) 



where 



VF^(y|x) exp(r) - 1 

*X"ovy"(x;y) = log-pp-^^, p= >0 



2 



6.3 Channel Resolvability 



409 



(note that p > 0 because r > 0). Here, the third equality follows because 
-Px"(ci), • • • ,Px"(cm„) are identical and Px"y"(x,y) = Px’‘(x)H^"(y|x). 
The inequality in (6.3.13) follows since i' ' ' indepen- 
dent. By recalling that M„ = e”^ = the first term on the 

right-hand side of (6.3.13) can be expressed as 



Pr 



Mn 



^p[ix"oiy"(X";F")] >p 



^ S ^ (X; Y) + 1 + ; log P 



which converges to 0 as n oo from the definition of /(X; Y). Therefore, 
it suffices to show that the second term on the right-hand side of (6.3.13) 
converges to 0 as n — > oo. Clearly, the second term is upper-bounded by 



Bn = Pr 



1 

Mn 



Mr, 

^exp[ix"oW"(X";Y'")] > 1 -|-p 



If all of the terms 



(6.3.14) 



exp[ix^-oiyn(X;; Y")] (i = 1, 2, • • • , Mn) 

in the sum in (6.3.14) are independent, in view of Eexp[zx^oW^(XJ^; Y^)] = 1 
we have Bn — > 0 as n oo by using p > 0 and Chebyshev’s inequality. 
However, this argument does not immediately follow because all the terms 
contain Y^ in common. We now analyze Bn in detail. For each y G Y^ and 
j = 1,2,- ••,Mn set 

Vn,j{y) = exp[ix"ovy"(X”;y)], (6.3.15) 

■^n,j(y) — 'Ki,j(y)l[In,j(y) ^ Mn], 

. M„ 

GM„(y) = TrE^«d(y). (6.3.16) 

” 1=1 
^ M„ 

TmAy) = 

^ 3 = 1 

Notice that for an arbitrarily fixed y G Yi,i(y), • • • , Yi,M^(y) are inde- 
pendent and so are Zn,i(y), • • • ,Zn,Mn(y)- view of (6.3.15) and (6.3.16), 
Bn is equal to the expectation of 



Bn{y) = Pr {GM.,.(y) > 1 4- p} (6.3.17) 

with respect to the probability distribution Pyrr(y), i.e., 

Pn = E Py'‘(y)Pn(y). 

yey" 

We notice that P«(y) can be upper-bounded as follows: 



(6.3.18) 



410 6 Identification Code and Channel Resolvability 



B„(y) <Pr{TM„,(y)^G'M„(y)} 

+ Pr {T m„ ( y) > 1+p}. (6.3.19) 

The first term on the right-hand side of (6.3.19) is evaluated as 

Pr{TM„(y)^GM„(y)} 

< Pr I ij [Znj{y) ^ p„j(y)]| 

Mr, 

J=1 

= MnPl{Vn,l{y)>Mn}. 

Thus, by taking the expectation of both sides with respect to Py^^y), it 
follows that 



^ Py.(y)Pr{TMjy) 7^ CM.(y)} 
yey^ 

< Mn ^ Py-(y) ^x-(x)l[exp[ix-ovy-(x;y)] > Mn] 
yey^ 



< 



Mn E E Pxny^ (x, y) exp[-ix"oW" (x; y)] 

xGA’”' yGy^ 

X l[exp[ix"oW"(x;y)] > M„] 

^ Px-y- (x,y)l[exp[ix"oW"(x;y)] > M„] 



x€X" y€y" 



Pr 



-log 

n 



W^{Y^\X^) 

py^y^) 




(6.3.20) 



which converges to 0 as n — > oo from the definition of /(X; Y). Therefore, 
in order to complete the proof, we have only to show that the expectation 
with respect to Pyr» (y) of the second term on the right-hand side of (6.3.19) 
converges to 0 as n ^ oo. We first notice that E(TM„(y)), the expectation of 
PMr, (y) with respect to Xf , X 2 , • • • , , can be evaluated as 

E(TM„(y)) = E(Z„,i(y)) 

= E Ex"(x)exp[ix"oiV"(x;y)]l[P„,i(y) < M„] 

= E Ex"|y"(x|y)l[V„,i(y) < M„]. 

< 1 



By recalling that Xp,X 2 , • • • i^Mr, independently generated subject to 
the identical distribution Px^ , Chebyshev’s inequality implies that 



6.3 Channel Resolvability 



411 



Pr {Tm,, (y) > 1 + p} < Pr {Tm^ (y) - E(Tm^ (y)) > p} 

< ^V(TM„(y)) 

where V(TM^(y)) denotes the variance of TM^(y) and the second inequality 
follows because Zn,i(y), Zn, 2 {y)i • • * , -^n,M^(y) are independent and subject 
to an identical distribution. Therefore, it follows that 



Pr {TmAY^ > 1 + /o} = E Py-(y)P^ {Pm A y) >l + p} 



yey^ 



^ E PYAy)Y{zlAy)) 



yey” 






(6.3.22) 



where E(Z^j(E^)) denotes the expectation of Z^ i(Y^) with respect to X[ 
and Y^. We now evaluate -^E(Z^ ^^(Y^)). First, we have 



1 



—E{zlAYA) 



X 1 [exp[ix"oW"(x;y)] < M„] 



X 1 



J_ 



ex^[ixr‘oWr-{X'^-,YA] < 1 



(6.3.23) 



Then, by recalling M„ = , the right-hand side of (6.3.23) is upper- 

bounded in the following way: 

1 



E(^^exp[ix^ou^.(X”;F”)]l 
= E(Texp[ix.w-‘(X";F")] 



M, 



■exp[ixnovy»(X”;y")] < 1 



X 1 



T exp[ix"oiV".(X"; F”)] < exp(-^) 

1 



+ E(— exp[ix^w.(X";F«)] 



412 6 Identification Code and Channel Resolvability 



X 1 



6xp(-y) < < 1 



< 






Py^(Y^ 

Prom the definition of /(X; Y), the second term on the right-hand side of 

TT-'T 

(6.3.24) converges to 0 as n ^ oo. In addition, we clearly have exp ( — —) 0 

as n ^ oo. Hence, we obtain 



Mr 



-E{Z^^{Y^))-^0 asn^oo. 



which, together with (6.3.22), yields 

Pr {Tm^ 0^^) > 1 H- /?} -^ 0 as n — > 00 . 

Thus, the expectation of the second term on the right-hand side of (6.3.19) 
with respect to Pyn. (y) turns out to satisfy 

^ Py„,(y)Pr{TM„(y)>l + p} 

y€3^- 

= Pr {Tm^ (T’^) > 1 -f p} — > 0 as n — > oo. 

This completes the proof of the theorem. □ 



Remark 6.3.3. The inequality (6.3.3) in Theorem 6.3.1 is not always sat- 
isfied with equality. For example, set X — {0,1, e} and y = {0,1} and 
consider the stationary memoryless channel W = {W} given in Fig. 6.1. If 
the input X — {X}^^ is defined as the stationary memoryless source with 
^x(O) = ^x(l) = 1/2, it holds that /(X; Y) = log2. In addition, the output 
Y becomes the stationary memoryless process satisfying PV(0) = FV(1) — 
1/2. However, this output Y can also be realized by the input X satisfying 
Px{^) — 1- Since X is clearly the M^-type source with Mn = 1, we have 
5'x(5|W) = 0. Therefore, it holds that 5x(<^|W) < /(X; Y) for all (^ > 0 in 
this case. It turns out, however, that 5(<5|W) = sup7(X; Y) = C(W) for all 

X 

0 < (^ < 1 from Theorem 6.4.2 below since this channel satisfies the strong 
converse property (see Example 6.4.1). □ 



Remark 6.3.4. We call a stationary memoryless channel W = {W} with 
finite input and output alphabets X and T, respectively, the full-rank channel 
if {W[ • \x)}xepc are independent as a set of vectors. It is shown that the full 
rank channel satisfies 

*5x(0|W)=7(X;Y) 

for all inputs X (Han and Verdu [47]). Therefore, 5(W) = 5(0|W) = 
sup/(X;Y). However, clearly, the channel given in Remark 6.3.3 above is 

X 



6.4 Identification Capacity Theorem and Channel Resolvability Theorem 413 




Fig. 6.1. 

not the full-rank channel. □ 



Theorem 6.3.1 and Remark 6.2.6 immediately yield the following theorem: 

Theorem 6.3.2 (Direct theorem: Han and Verdii [46]). Let W be an 
arbitrary channel. Then, it holds that 

5(^1 W) < sup7(X; Y) (6.3.25) 

X 

for all 5 >0. * □ 



We have established Theorem 6.2.1 on the identification capacity and Theo- 
rem 6.3.2 on the channel resolvability. In fact, both of these two theorems are 
the direct theorems. Though there seems no relationship between these two 
theorems, they are essentially related at a deep level; each makes up for the 
other. The following section is devoted to a description of this relationship. 



6.4 Identification Capacity Theorem and Channel 
Resolvability Theorem 

In §6.2 and §6.3 we have obtained the direct theorems on the identification 
capacity and the channel resolvability. On the other hand, in order to develop 
the converse theorem, the following lemma, which indicates a key connecting 
Theorem 6.2.1 with Theorem 6.3.2, plays a fundamental role. Here, we need 
the assumption that Y is a finite input alphabet. This lemma means that the 
direct theorem on the channel resolvability becomes the converse theorem on 
the identification coding and vice versa. 

* Theorem 6 in Han and Verdii [46] claims that *S'(W) = S'(0|W) = sup/(X; Y) 

X 

always holds for any channel W if its input alphabet is finite. However, the proof 
in [46] contains a mistake in part. Therefore, so far there is no result that shows 
whether this inequality holds or not. 



414 6 Identification Code and Channel Resolvability 



Lemma 6.4.1 (Han and Verdii [46]). Suppose that W = an 

arbitrary channel with a finite input alphabet A'. Then, for all 5 > 0, fi > 0 
and A > 0 satisfying 6<1 — fi — X it holds that 



D(/i,A|W) <S{5\W). 



(6.4.1) 



Proof Suppose that Ri is (/i, A)-achievable as an arbitrary rate of the iden- 
tification coding. Then, there exists an (n, An)-code satisfying 



limsup/in < li, limsupAn < A, 

n—^oo n— >cx) 

lim inf — log log Nn > Ri 

n—^oo n 



(6.4.2) 

(6.4.3) 



due to the definition of the (/i, A)-achievability. Letting Qi,Q2? ’ ‘ ‘ ',QNr^ be 
the codewords (probability distributions over X'^) of this identification code 
and ‘ C the corresponding decoding regions, for any j ^ k 

it follows that 

d{QjW^^QkW^) > 2{QjW^{Vj) - QkW^iVj)) 

> 2(1 - - A„). 

Thus, we obtain 

lim ini d{QjW'^,QkW'^) > 2(1 — lim sup — lim sup A^) 

n^oo n— »-oo n— >cxd 

>2(l-/i-A) (6.4.4) 

from (6.4.2). On the other hand, suppose that R 2 is 5-achievable as an ar- 
bitrary rate of the channel resolvability. Then, there exist the probability 
distributions Qj (j = 1, 2, • • • , Nn) of the M^-type over X^ satisfying 



limsMp d{QjW^,QjW^) < 5, 

n— >00 

lim sup — log Mn < R 2 



(6.4.5) 

(6.4.6) 



for all j = 1,2, • ,iV„. We assume here that there exist j and k, {j ^ k) 

such that Qj = Qk- Since we have 

d{QjW^, QkW^) < d{QjW^, Q, W") + d{QkW^, QjW^) 

= diQjW^, QjW^) + d{QkW^, QkW^) 
from the triangle inequality on the variational distance d, (6.4.5) yields 
liminfd(QjW”,QfeW”) < \ixn sup d{QjW^,Q,W^) 

n—^oo n— >-oo 

+ limsupd(QfeW",QfcW") 



< 2S. 



(6.4.7) 



However, (6.4.7) contradicts (6.4.4) because 5 < 1— /x— A from the assumption 
of the lemma. Therefore, Q, ^ Qk must be satisfied if j 7^ k, i.e., all of 



6.4 Identification Capacity Theorem and Channel Resolvability Theorem 415 



Qi^Q2r “ inust be distinct as the M^-type distributions over We 

note here that there are at most 

distributions of the M^-type over . Hence, it must hold that 

Nn < (6.4.8) 

(we use here the assumption that A is a finite input alphabet). Since this 
inequality leads to 

logAT^ < nMn log I A|, 

and therefore, 

1 111 
-loglogNn < - logMn + -logn + -loglog|A|, 
n n n n 

we obtain 

1 1 
lim inf — log log Nn < lim sup — log Mn ■ 

n >oo Tl ji — >oo 

Then, (6.4.3) and (6.4.6) imply 
R\ < jR2- 

The claim of the lemma follows because R\ is an arbitrary (/i, A)-achievable 
rate of the identification coding and R 2 is an arbitrary ^-achievable rate as 
the channel resolvability. □ 

By combining Theorem 6.2.1, Theorem 6.3.2 and Lemma 6.4.1 and re- 
calling Theorem 3.2.1 and 

C(W) = C(0|W) < C(e|W) (0 < Vs < 1), 

we have the following theorem claiming a relationship on the channel capacity, 
the identification capacity and the channel resolvability, where C(W) denotes 
the channel capacity of a channel W (Definition 3.1.2). 

Theorem 6.4.1. Suppose that W is an arbitrary channel with a finite input 
alphabet A. Then, for all e>0, 6>0, ia>0 and A > 0 satisfying 

^^A, 6 < 1 — /j, — X, 

it holds that 

sup/(X; Y) < C(s|W) < D(/i, A|W) 

X 

< S{5\W) < sup7(X; Y). (6.4.9) 

X 

It is important to consider what kind of channel W satisfies the inequal- 
ities (6.4.9) in Theorem 6.4.1 with equality. The following theorem gives an 
answer to this question. 



416 6 Identification Code and Channel Resolvability 



Theorem 6.4.2 (Fundamental theorem: Han and Verdii [46]). If a 

channel W with a finite input alphabet A' satisfies the strong converse prop- 
erty, for all € > 0, 5 > 0, > 0 and A > 0 satisfying 

(S<1, ^<1 (6.4.10) 

it holds that 

sup/(X; Y) - C{e\W) = L>(/x, A|W) 

X 

S{5\W) = sup7(X; Y). (6.4.11) 

X 



Proof Since Theorem 3.5.1 tells us that 

sup/(X;Y) -sup7(X;Y) 

X X 

for a channel W satisfying the strong converse property, the claim of the 
theorem immediately follows from Theorem 6.4.1. □ 

Theorem 6.4.2 immediately yields the following two corollaries: 

Corollary 6.4.1 (Identification capacity). If a channel W with a finite 
input alphabet Y satisfies the strong converse property, for all p > 0 and 
A > 0 satisfying /i + A < 1 we have 

D{p,X\W) = C(W). (6.4.12) 

Corollary 6.4.2 (Channel resolvability). If a channel W with a finite 
input alphabet X satisfies the strong converse property, for all S with 0 < 5 < 
1 we have 

5(^|W) = C(W). (6.4.13) 

Example 6.4.1. Stationary memoryless channels W = {W} with finite in- 
put and output alphabets X and y, respectively, satisfy the strong converse 
property. This can be verified by setting c{x) = P = 1 for all x G Y in 
Theorem 3.7.2. Therefore, it holds that 

D{p,X\W) = S{S\W) = C(W) = max/(X; Y) 

X 

for all /X > 0 and A > 0 satisfying p-\- X < 1 and 0 < 5 < 1 . □ 



Example 6.4.2. Let us consider the mixed channel W of the binary sym- 
metric stationary memoryless channels W\ and W2 given at the end of Ex- 
ample 3.4.2 in §3.4. Here, and W 2 are assumed to be mixed according 
to weights oi > 0 and 0:2 > 0 with -f 02 — 1- Since the e-channel capacity 
C(e|W) is given as illustrated in Fig. 3.3, we have 

C(e|W) - log 2 - h{p2) > C(W) - log 2 - h{pi) 



6.4 Identification Capacity Theorem and Channel Resolvability Theorem 417 



for all 1 > 5 > ai (note that h{pi) > h{p 2 ) is assumed). Thus, for the case 

of 0 < ai < - where we can choose and A such that /i > ai, A > and 

/i + A < 1, Theorem 6.4.1 indicates that the (/i, A)-identification capacity 
D(/i, A|W) satisfies 

£)(m,A|W)>C(W), 

which is greater than the channel capacity C(W). That is, the (/i, A)- 
identification capacity 7^(/x, A|W) does not always coincide with the channel 
capacity C(W). In fact, since we have 

sup7(X;Y)=log2-%2) 

X 

in this case, it holds that 

L)(/x, A|W) = S{5\W) = log 2 - h{p 2 ) > C(W) 

for all /i > ai, A > ai (/i + A < 1) and 0 < 5 < 1 — 2a\. This means that the 
channel resolvability 5(W)is also greater than C(W). □ 



Remark 6.4.1. What happens if we drop the condition /x + A<l(/i>0, 
A > 0) from Corollary 6.4.1? That is, what does the identification capacity 
D{fi, A|W) become under the condition of /i + A > 1? As far as a channel W 
does not “degenerate,” we can adequately choose an input sequence G 
(n == 1, 2, • • •) and a set Sn C (n = 1, 2, ■ • •) satisfying 

limsup < /i, 

n— ^-oo 

limsup < A. 

n^oo 

If we define all of the codewords (probability distributions over A^) 

• • • 5 Qiv„. as the one-point distribution with the concentrated probability on 
and the corresponding decoding regions hy Vi = V 2 = • ■ ■ = 
this identification code becomes (/i, A)-achievable. Since there is no restriction 
on Nm we can set = -f 00 for each n. Then, it follows that 

L>(/i,A|W) = + 00 . 

That is, the notion of the identification capacity no longer makes sense. □ 



Remark 6.4.2. Nothing is known on 5((5|W) if we drop the condition 
0 < 5 < 1 from Corollary 6.4.2. □ 



Remark 6.4.3. While Theorem 6.4.2, Corollary 6.4.1 and Corollary 6.4.2 
hold under the assumption of the “strong converse property” of a channel W, 



418 6 Identification Code and Channel Resolvability 



we have the following inequality for general channels without the strong con- 
verse property claiming that the identification capacity D(/i, A|W) is upper- 
bounded by the e-channel capacity C(e|W). This inequality, which can be 
developed by slight modification of the proof of Theorem 6.3.1, was proved 
by Steinberg. 

Theorem 6.4.3 (Steinberg [84]). If the input alphabet X of a channel W is 
a finite set, for all 0 < /i < 1, 0 < X < 1 and 0 < e < 1 with ja X < e it 
holds that 

A|W) < C(e|W). (6.4.14) 

By noticing that C(e|W) is right-continuous with respect to e and setting 
^ = A = 0 and e — ^ 0 in (6.4.14), we can immediately obtain the following 
corollary: 

Corollary 6.4.3. If the input alphabet X of a channel 'W is a finite set, we 
have 

D{W) = D{0,0\W) < C(W) = C(0|W). (6.4.15) 

(We can regard this inequality as the ‘^converse theorem” in the ordinary 
sense, i.e., the weak converse theorem that dijfers from Corollary d.f.l giv- 
ing the strong converse theorem. ) □ 

On the other hand, if we set /i = A = 0 in (6.2.10) in Theorem 6.2.1, we 
can obtain 

D(W) = D(0, 0|W) > C(W) = C(0|W). (6.4.16) 

Then, in view of (6.4.15) and (6.4.16) we can obtain the following fundamen- 
tal theorem connecting the identification capacity JD(W) with the channel 
capacity C(W). 

Theorem 6.4.4 (Fundamental theorem). If the input alphabet X of a 
channel VI is a finite set, it holds that 

D(W) = D(0, 0| W) - C(W) = C(0| W). (6.4.17) 

We conclude this section by treating a problem that is not the main inter- 
est of this section but is related to the approximation problem of the output 
distribution (the channel resolvability problem). We use the divergence dis- 
tance instead of the variational distance as a measure of approximation. The 
following theorem formulates the well-known fact that the output distribution 
of a channel that corresponds to the input maximizing the mutual informa- 
tion between the input and the output is approximately equal to the output 
distribution of the transmission code with rate equal to the channel capac- 
ity. This theorem corresponds to Theorem 2.6.4 for the fixed-length source 
coding. 



6.4 Identification Capacity Theorem and Channel Resolvability Theorem 419 



Theorem 6.4.5 (Han and Verdii [46]). Let W = {^^}^=i arbi- 

trary channel with a finite input alphabet X that satisfies the strong con- 
verse property. Denoting by C(W) the channel capacity of W, we consider 
an (n, Mm£n)-code Cn = {ui, U 2 , • • * ? G X'^) satisfying 



lim 6n = 0 and lim — logM^ = C(W). 

n— >oo n—^oo fi 



(6.4.18) 



Let be the channel output corresponding to the input X'^ subject to the 
uniform distribution overCn- Then, we have 

lim lD(f"||F”) =0, 

n-^oo n 

where denotes the channel output that corresponds to the optimal input 
X attaining 

I(X'"-,T') = max/(X’";y^). 



Proof Since A:* is a finite input alphabet, we first have 
I{X"'-,Y"')>D{W^{-\x)\\Py^) (VxgA^) 
from the Kuhn-Tucker theorem (cf. Gallager [30]). Hence, it follows that 
I(x"'',Y"')-I{X^;Y^) 

> ^ P^„{^)D{W^i-\^)\\Pyr.) 

- Pj^„{K)D{W^-\K)\\Py^) 

xGA’” 

= E E P;f„(x)M^”(y|x)log^M 

^DiX^WY""). (6.4.19) 

On the other hand, since the Fano inequality (3.1.10) proved in 2) in Theo- 
rem 3.1.1 holds for general channels not restricted to stationary memoryless 
channels, we have 



log Mn < 



I{X^-Y^)Yh{Sn) 
1 - 



which leads to 

liminf IlogM„ < lim inf 1/(1"; f"). 

n^oo 77, n— >oo 77 

Therefore, we obtain 

C(W) = lim inf 1 log M„ < lim inf 1/(1"; 1”) 

n— »oo 77, n-^oo 77 

< liminf 1/(1”;F”) 

n— >cxD 77 



(6.4.20) 



420 6 Identification Code and Channel Resolvability 



from (6.4.18) and (6.4.19). We notice here that Corollary 3.5.1 in Chapter 3 
implies that 

liminf = limsup -7(X”;F") = C(W) 

n-^oo n n-^oo ^ 

due to the assumption that A:’ is a finite input alphabet and W satisfies 
the strong converse property. The combination of this equality with (6.4.20) 
yields 

liminf = C(W). 

n^oo Ti 

By dividing both sides of (6.4.19) by n and taking limsup, it follows that 

n— »-oo 

limsup I |F”) < limsup -liminfl/(X";y") 

n — >^oo ^ n — >oo ^ ^ ^ 

= C(W) - C7(W) = 0, 

which establishes 

lim l£)(f"||y") =0. 

n-^oo Tl 

Example 6.4.3. Let us consider the case that W is the binary symmetric 
stationary memoryless channel with A:' = 3^ = {0, 1}. It is easy to verify that 

Pyr.{y) = 2-^ (\/yey^) 

in this case. Then, Theorem 6.4.5 indicates that the output Y'^ of an 
(n, Mn, ^n)-code with (6.4.18) satisfies 

lim =0, 

n-^oo Tl 

i.e., 

lim =log2. 

n— too Tl 

Hence, the output distribution Pyr. tends to be a distribution arbitrary close 
to the uniform distribution as n — > oo. □ 

Theorem 6.4.5 can immediately be generalized to the following corollary 
concerning channels with cost constraint (see §3.6). 

Corollary 6.4.4. Let W = be an arbitrary channel with a fi- 

nite input alphabet ^ that satisfies the strong converse property under cost 
constraint F. Denoting by Cs(P|W) the F -cost channel capacity of ^ 
subject to the cost constraint F, we consider an Mn, Sn^ D)-code Cn = 
{ui, U 2 , • • • , um„, } (ui G X'^) satisfying 



6.5 Identification Capacity with Cost Constraint 



421 



lim Sn = 0 and lim — log = Cs(r’IW). (6.4.21) 

n— >-oo n—^oo jl 

Let Y'^ be the channel output corresponding to the input X'^ subject to the 
uniform distribution overCn- Then, we have 

lim lD(y"l|F”) = 0 , 

n^oo n 

where, letting Vn{T) be the set of all input variables X'^ satisfying 
Pr|lc„(X”) <r| = 1, 

denotes the output distribution that corresponds to the optimal input ~X^ 
attaining 

I{X"'‘T")= max I{X^‘Y^). 
x^>-eVr.(n 



Proof We can prove the theorem by using Corollary 3.7.1 in Chapter 3 in 
the same way as the proof of Theorem 6.4.5. In the proof, however, we need 
to note that Lemma 3.7.1 (Kuhn-Tucker condition) still holds if W and c{x) 

are replaced with and — c^(x), respectively, and X'^ satisfies 



Pr 




<r 



= 1 . 



6.5 Identification Capacity with Cost Constraint 

Let us consider here the identification coding problem of channels with cost 
constraint as described in §3.6 where we consider cost of each codeword. We 
define the cost Cn(x) of x G in the same way as §3.6 and consider the 
cost constraint 

^Cn(x) < r. (6.5.1) 

Furthermore, we set 

X^{r) = |x G A"" I lc„(x) < r| (6.5.2) 

and denote hy Sr the collection of all input processes X = satisfying 

Pr{X^ G X^{P)} = 1 

for all n = 1 , 2 , • • In the identification coding problem with cost constraint 
we impose an additional condition that all of the codewords Qi, Q 25 • ' ' 5 Qiv^ 
(probability distribution over X'^) satisfy 



422 6 Identification Code and Channel Resolvability 



Qj{X^{r)) = l = (6.5.3) 

on the identification coding problem, though the problem itself is formulated 
in the same way as before. We call an (n, An)-code satisfying (6.5.3) 

an (n, An, -T)-code and give the following definitions for a general 

channel W. 

Definition 6.5.1. 

Rate R is (/i, A, i^)- achievable 

There exists an (n, i~')-code satisfying 

lim sup lin ^ 1^^ lim sup A^ < A and lim inf — log log > R- 

n^oo n-^oo n—^oo n 



Definition 6.5.2 ((//, A, -T)-channel capacity). 

D(/i, A, r| W) = sup {i^ I is (//, A, jT)-achievable} . 

We have the following theorem under these definitions, which corresponds 
to Theorem 6.2.1. 

Theorem 6.5.1 (Direct theorem). Let W be an arbitrary channel. If 
0 < e < jji and 0 < e < X, it holds that 

D{fi, A, r|W) > Cs{e, r\W) (6.5.4) 

under cost constraint F, where Cs(£:, T|W) means the {e^r)-cost channel 
capacity defined in Definition 3.6.4- 

Proof. The proof parallels the proof of Theorem 6.2.1 except for using the 
(n, Mn^Sni F)-code instead of the (n, 6:n)-code. □ 

In order to develop the converse theorem on the identification coding 
problem with cost constraint we need to define the channel resolvability with 
cost constraint similarly to the channel resolvability without cost constraint. 
We will define the channel resolvability with cost constraint in the following 
section. 



6.6 Channel Resolvability with Cost Constraint 

Define A'’^(r') and as are defined in the preceding section and denote by 
Vn{D) the collection of all random variables satisfying Pr {X^ G X^{F)} = 
1. We give the following definitions corresponding to Definition 6.3.1, Defini- 
tion 6.3.3 and Definition 6.3.4, respectively: 



6.6 Channel Resolvability with Cost Constraint 423 



Definition 6.6.1. 

Rate R is ((^, r)-achievable for an input X = G Sr 

^4^ There exists a transform = TniUM^) C Vn{r) satisfying 



where and denote the channel output corresponding to X'^ and X'^, 
respectively. 

Definition 6.6.2. 

Rate R is {5, 7^) -achievable 

<4?^ R is (5, r) -achievable for all inputs X G <Sr- 

Definition 6.6.3 (Channel (5, r')-resolvability). 

S{5^ -^|W) = inf {i^ I R is {S, F) -achievable} . 

We have the following theorem corresponding to Theorem 6.3.2 under these 
definitions. 

Theorem 6.6.1 (Direct theorem). Let W be an arbitrary channel. Then, 
for all 5 >0 it holds that 



under cost constraint F. 

Proof. We have only to notice that in the proof of Theorem 6.3.1 all the ele- 
ments in the random code Ci, C 2 , • • • , cm^ , each of which is generated subject 

to the probability distribution satisfy the cost constraint —Cn(ci) < F 

n 

(i = 1, 2, • ■ • , Mn) provided that X = {X^}^j G Sp- □ 

Furthermore, if we restrict the input X to X G <S/- and apply the same 
argument in the proof of Lemma 6.4.1, we have the following key lemma 
connecting Theorem 6.5.1 with Theorem 6.6.1. Here, we need again the as- 
sumption that X is a finite input alphabet. 

Lemma 6.6.1. Suppose that W = is an arbitrary channel with 

a finite input alphabet X. Then, for all 6 > 0, fa > 0 and A > 0 satisfying 
5<1— fi — X it holds that 




S{S,F\W) < sup /(X;Y) 



( 6 . 6 . 1 ) 



X€Sr 



D{ti,\,r\w)<s{6,r\w) 



( 6 . 6 . 2 ) 



under cost constraint F. 



□ 



424 6 Identification Code and Channel Resolvability 



We obtain the following theorem from Theorem 6.5.1, Theorem 6.6.1, 
Lemma 6.6.1, Theorem 3.6.1 and 

c,{r\w) = c,(o, r|w) < r|w) (o < < i), 

where Cs(jT|W) denotes the T-cost channel capacity of a channel W defined 
in Definition 3.6.2. 

Theorem 6.6.2. Suppose that W = is an arbitrary channel with 

a finite input alphabet X . Then, for all s>0, S>0, fi>0 and A > 0 
satisfying 

^<A, 5 < 1 — p — X 

it holds that 

sup /(X; Y) < Cs{e,r\W) < D(/i,A,r|W) 

XG<Sr 

< S{S,r\W) < sup 7(X; Y) (6.6.3) 

X€5r 

under cost constraint F. □ 



Theorem 6.6.2 and Theorem 3.7.1 yield the following theorem. 

Theorem 6.6.3 (Fundamental theorem) . If a channel ^ with an input 
alphabet X satisfies the strong converse property under cost constraint F, for 
a// 5 > 0, 5 > 0, /i > 0 and A > 0 satisfying 

e<l, /i + A<l, 5 <1 (6.6.4) 

it holds that 

sup /(X; Y) - Cs{e,F\W) = D(/i, A,T|W) 

X€«Sr 

- S{S,F\W) = sup 7(X; Y). (6.6.5) 

X€5r 

Theorem 6.6.3 immediately yields the following two corollaries: 

Corollary 6.6.1 (Identification capacity with cost constraint). If a 

channel W with an input alphabet X satisfies the strong converse property 
under cost constraint F, for all p>0 and A > 0 with p X < 1 we have 

D{ii,x,r\w) = Cs{r\w). 

Corollary 6.6.2 (Channel resolvability with cost constraint). If a 

channel W with an input alphabet X satisfies the strong converse property 
under cost constraint F, for all S with 0 < 5 < 1 we have 

S{5,F\W) = Cs{F\W), 



6.7 Identification Capacity and Resolvability of Continuous Input Channels 425 



Example 6.6.1. Since stationary memoryless channels W = {W} with fi- 
nite input and output alphabets X and respectively, satisfy the strong 
converse property under any additive cost constraint F (Theorem 3.7.2), 
Corollaries 6.6.1 and 6.6.2 and Theorem 3.6.2 imply that 



6.7 Identification Capacity and Resolvability of 
Continuous Input Channels 

So far we have obtained the fundamental theorem on the identification ca- 
pacity and the channel resolvability (Theorem 6.4.2). However, Lemma 6.4.1, 
which provides a key for developing the fundamental theorem, deeply depends 
on the assumption that A:' is a finite input alphabet. As a consequence, the 
fundamental theorem also deeply depends on this assumption. Similarly, the 
fundamental theorem on the identification capacity and the channel resolv- 
ability both with cost constraint (Theorem 6.6.3) is based on Lemma 6.6.1 
that also deeply depends on the same assumption. 

Nevertheless, even if we drop the assumption on the finiteness of input al- 
phabet, we can develop a fundamental theorem on the identification capacity 
and the channel resolvability under an adequate condition on the “stability” 
of the channel outputs with respect to the channel inputs. In this section 
we observe channels with the input alphabet A' = R under cost constraint 
as one of such cases, where R denotes the set of all real numbers. After that, 
we give a theorem treating the identification coding problem and the channel 
resolvability problem on the additive white (also, non-white) Gaussian noise 
channel given in §3.7 as a special case. 

Now, let W == be a channel with cost constraint such that the 

input alphabet is A' = R and the output alphabet y is arbitrary. We impose 
the following two assumptions on this W. First, letting be a cost function 
and setting 



for arbitrary cost constraint F in the same way as §6.5, we assume that there 
exists an n- dimensional hypercube Vn{F) of edge length ln{F) > 2 in 
satisfying 



J9(/i, A,r|W) = S{5,F\W) = Cs{F\W) = 



max I(X]Y) 
X:Ec{X)<r 



for all // > 0 and A > 0 with /i + A < 1 and 0 < (^ < 1 . 



□ 




limsup — log log in (I^) = 0 



(6.7.1) 



and 



426 6 Identification Code and Channel Resolvability 



;f"(r) c y „(0 (n = i, 2 ,---). (6.7.2) 

Next, letting D{W'^{ ■ lv)||Ty”( • |x)) (v,x e A'") be the divergence between 
conditional distributions ■ |v) and W”'{ ■ |x), we define the nx n Fisher 
information matrix F„(x) by 



i^„(x) = 
ere v = (^ 
\\Fn{^)\\ 



52 



dvidvj 



z)(iy"(-|v)||iy"(-|x)) 



(6.7.3) 



where v = (ui, U2, • • • , u„). Moreover, we define the norm of F„(x) by 

vF,(x)vT 
sup ^ — , 



where the superscript “T” denotes the transpose of a vector. We assume that 

lim sup -log maxi l,log ( sup ||Fn(x)|| ) i = 0. (6.7.4) 

n y Yx€Kr(r) J J 

Notice that the assumptions (6.7.1) and (6.7.4) are quite mild. Under 
these assumptions we prepare the following. First, (6.7.4) tells us that 

sup ||F„(x)|| < exp(e”^") (6.7.5) 

x€V„(0 



for some {(^n > 0} satisfying lim = 0. We set 

n— too 



In 




(6.7.6) 



and partition the hypercube Vn{I") in (6.7.2) into small hypercubes with edge 
length 



= exp(-e^^-). (6.7.7) 

Denoting by An^ {i = 1, 2, ■ • • , fc^(r)) such small hypercubes, the number 
kn{r) of the small hypercubes is given by 



knin = 




Then, it follows from (6.7.1), (6.7.7) and lim 7n = 0 that 

n— too 



lim sup — log log kjj^{r) = 0. 

n— too 



(6.7.8) 



For each i = 1, 2, • • • , kn{r) we adequately choose a representative point 
in the hypercube An^ and set 

'^n(C) = {ui,U 2 ,---,Ufe„(r)}. (6.7.9) 

Then, for any probability distribution Q over satisfying Q(A'^(T)) = 1 
we define the probability distribution Q over 1Zri{r) by 

g(ui) = g( 4 '^) (i = i,2,---,fc„(r)) (6.7.10) 

{quantization of a probability distribution). Then, we have the following 
lemma characterizing the variational distance between QW^ and QW'^. 



6.7 Identification Capacity and Resolvability of Continuous Input Channels 427 



Lemma 6.7.1. Under the assumptions (6.7.1) and (6.7. J^), for any proba- 
bility distribution Q satisfying Q[X'^{r)) = 1 and all n — 1,2, ••• it holds 
that 

d{QW^,QW^) < v^exp(-ie^). (6.7.11) 

Proof. The Taylor expansion of the divergence D(W'^{ • |x)||IT^( • |u^)) with 
respect to x can be written as 

D(W"(.|x)||IT"(.|u,)) 

= - Ui)F„(x')(x - Uj)'^ 

for any x G An\ where x' = + 6(k — u^) (0 < 3^ < 1). Note that x' G AnK 

Then, it follows from the definition of the norm ||F^(x')|| that 

i^(IT"(-|x)||IT"(.|u,)) 

< 5ll■F’»^(x')ll(x - Uj)(x - Ui)'^. (6.7.12) 

We notice here that, since both x and belong to An\ the absolute value 
of each component of x — is less than or equal to An- Hence, 

(x - u^)(x - < nA^. (6.7.13) 

Thus, in view of (6.7.5)-(6.7.7) and (6.7.13), (6.7.12) can be evaluated in the 
following way: 

I^(W-(.|x)||W-(.|u,)) 

< |exp(e”'*")exp(-2e"^") 

= |exp(e"'^" -2e”'>'”) 

< — exp(— e"'*'”) 

Zi 

<-exp(-e^). (6.7.14) 

Using the inequality 

\d\PuPi)<D{P^\\Pi) 

for two probability distributions P\ and P^ (cf. Csiszar and Korner [19]), 
(6.7.14) leads to 

d(TU”(-|x),tU"(.|uO) < ^/Hexp(-ie^). 

Therefore, for any x G A^n and any subset B d'y'^ vjq have 
|lU”(S|x) - lU"(B|ui)| < ^exp(-Ie^^). 



(6.7.15) 



428 6 Identification Code and Channel Resolvability 

On the other hand, by using Q{X"'{r)) = 1, it follows from (6.7.10) that 

Kxn _ 

= ^ W^”(6|x)Q(x) - Y. W^{B\ni)Qiui) 

xGA’"'(r) i=l 

Kin Kin 

= E E -EE w^mui)Q{K) 

Kin 

= E E - W^"(B|u,))Q(x). 

xeAli^ 

Hence, it holds that 

\QW^{B) -QW^{B)\ 

Kin 

< E E \W^{m-W^{B\vu)\Q{-K). 

Substituting (6.7.15) into this inequality yields 

knin j — -j 

\QW^(B)-QW^(B)\ < Y QK)^exp(--e'^) 




Consequently, we obtain 

d{QW^,QW^) = 2 sup \QW^{B) -QW^{8)\ 
6cy^ 




which completes the proof of the lemma. □ 

Now, we are ready to describe a relationship between the identification 
capacity and the channel resolvability with cost constraint for the case of A' = 
R. To this end, we prepare the following lemma corresponding to Lemma 6.4.1 
in §6.4 and Lemma 6.6.1 in §6.6. Notice here that we do not need the 
assumption that A is a finite input alphabet. 

Lemma 6.7.2. Assume that a channel W with the input alphabet A = R 
satisfies the assumptions (6.7.1) and (6.7.4) > Then, for all S > 0, ji > 0 and 
A > 0 with 5<1 — /a — X it holds that 

Dii^,x,r\w)<s{5,r\w) 



under cost constraint F. 



(6.7.16) 



6.7 Identification Capacity and Resolvability of Continuous Input Channels 429 



Proof. Though the proof parallels the proof of Lemma 6.4.1, we need to no- 
tice here that A' is not a finite input alphabet. Suppose that Ri is (//, A, P)- 
achievable as an arbitrary rate of the identification code with cost constraint. 
Then, due to the definition of the (/i. A, jT)-achievability, there exists an 
(n, Nn, Hn, A„, r)-code satisfying 

limsup/i„ < fi, limsupA„ < A, (6.7.17) 

n—^oo n—^oo 

lim inf — log log > R\. (6.7.18) 

n— too n 

Let Qi,Q2r ’ ' be the codewords (probability distributions over A'^(T)) 

of this identification code and denote by Pi, * * * ? C y'^ the corre- 
sponding decoding regions. Then, for any j ^ k it holds that 

diQjW^^QkW^) > 2{QjW^{Vj)-QkW^{Vj)) 

> 2(1 - fXn- A„). 

Thus, we can obtain 

lim inf d{QjW^^QkW^) > 2(1 — limsup/i^ — limsupAn) 

n-^oo n— too n-^oo 

> 2(1 -/i- A) (6.7.19) 

from (6.7.17). On the other hand, suppose that R 2 is (5, T)-achievable as an 
arbitrary rate of the channel resolvability with cost constraint. Then, there 
exist probability distributions Qj {j = 1,2, •••,A^n) of the M^-type over 
A'^(jT) satisfying 

limsup d{QjW^,QjW^) < 5, 

n— too 

lim sup — log Mn < R 2 

n— too 

for all j == 1,2, • • • , A^n- So far, the argument is the same as the proof of 
Lemma 6.4.1 except for the point that we are considering the cost constraint 

r. _ _ 

Now, let Qj denote the probability distribution Q over lZn{P) defined 
by (6.7.10), setting Qj instead of Q. Since Qj is the M^-type, so is Qj. By 
applying Lemma 6.7.1 with using Qj and Qj instead of Q and Q, respectively, 
we have 

d{Q^W^,Q^W'-) < v^exp(-|e'^), 

which implies 

lim d{QjW'^,QM^) = 0. 

On the other hand, since it holds that 



(6.7.20) 

(6.7.21) 



(6.7.22) 



430 6 Identification Code and Channel Resolvability 



d(QjW^,Q^W^) 

< d{QjW^,QjW^) + d{QjW^,QjW^), 

(6.7.20) and (6.7.22) yield 

\im sup d{Q jW'^, Q jW’^) <S (j = 1, 2, • • • , AT„). (6.7.23) 

n—^oo 

Now, suppose that Qj = Qf. for some j ^ k. Then, since we have 

d{QjW^,QkW^) < d{QjW^,QjW^) + d{QkW^,QjW^) 

= d{QjW'^,QjW^) + d{QkW^,QkW^), 

(6.7.23) tells us that 

hmsupd{QjW^,QkW^) < 25. (6.7.24) 

n— >oo 

However, (6.7.24) contradicts (6.7.19) because S < 1 — fi — X from the as- 
sumption of the lemma. Hence, Qj ^ Qf. must be satisfied if j ^ fc, i.e., 

Q 1 5 Q 2 5 " ’ ’ ’ ^ Nri 

must be disjoint probability distributions of the M^-type over 'JZn{r). On 
the other hand, since there are at most 

= (fc„(r))"" 

probability distributions of the M^-type over 'JZn(r), it must hold that < 
Therefore, 

logNn < Mn log /Cn(r), 

which leads to 

- log log Nn<- log Mn + - log log kn (T) . 
n n n 

Then, by virtue of (6.7.8) we obtain 

lim inf — log log Nn < lim sup — log M^, 

TL yoo TL ji — ^oo ri 

which, together with (6.7.18) and (6.7.21), shows that 
^1 ^ ^2- 

This establishes the claim of the lemma because Ri is an arbitrary (//, A, F)- 
achievable rate of the identification code and R 2 is an arbitrary {S,F)- 
achievable rate as the channel resolvability. □ 

Since Theorem 6.5.1 in §6.5 and Theorem 6.6.1 in §6.6 hold for any input 
and output alphabets, we immediately obtain the following theorem corre- 
sponding to Theorem 6.6.2 by using Lemma 6.7.2 instead of Lemma 6.6.1 
and recalling Theorem 3.6.1 and 



6.7 Identification Capacity and Resolvability of Continuous Input Channels 431 



Cs{r\w) = c,(o, r|w) < c,(s, r\w) (o < Ve < i), 

where Cs(r'|W) denote the i~'-cost channel capacity of a channel W defined 
in Definition 3.6.2. 

Theorem 6.7.1. Suppose that W is an arbitrary channel with the input 
alphabet A' — R satisfying the assumptions (6.7.1) and (6.7.4) • Then, for all 
e>0,S>0, fi>0 and A > 0 satisfying 

^<A, S < 1 — /2 — X 

it holds that 

sup /(X; Y) < Cs{e,r\W) < D(/i,A,r|W) 

XG«Sr 

< S{S,r\W) < sup 7(X; Y) (6.7.25) 

XE«Sr 

under cost constraint F. □ 



Theorem 6.7.1 and Theorem 3.7.1 yield the following theorem. 

Theorem 6.7.2 (Fundamental theorem) . If a channel W with the input 
alphabet ^ = H, which satisfies the assumptions (6.7.1) and (6.7.4), satisfies 
the strong converse property under cost constraint F, then for alls >0, S > 0, 
p > 0 and A > 0 with 

6 < 1, /x + A<l, 5 < 1 

it holds that 

sup /(X; Y) = C,(e,r|W) = D(/i, A,r|W) 

XG5r 

= 5'(5,r|W)- sup 7(X;Y). (6.7.26) 

XG<Sr 

The following two corollaries are immediate consequences of Theorem 6.7.2. 

Corollary 6.7.1. If a channel W with the input alphabet X = H, which 
satisfies the assumptions (6.7.1) and (6.7.4), satisfies the strong converse 
property under cost constraint F, then for all p > 0 and A > 0 with p-\-X < 1 
we have 

£)(/x,A,r|w) = c,(r|w). 

Corollary 6.7.2. If a channel W with the input alphabet ^ = H, which 
satisfies the assumptions (6.7.1) and (6.7.4), satisfies the strong converse 
property under cost constraint F, then for all S with 0 < S < 1 we have 

S{5,F\W) = Cs{F\W). 



432 6 Identification Code and Channel Resolvability 



Now, let us consider the stationary memoryless additive white Gaussian 
noise channel (AWGN channel) W = {W} given in §3.7 as an example of 
the channels with A' = R. Recall that the input and output alphabets of this 
channel are ^ = y = H and the probability transition density is given by 

^ (6.7.27) 

Setting 

V = {vi,V2, ■■■ ,Vn) e 
X = {XI,X2, ■■■ ,Xn) € X'^, 

it follows that 

£l(W^"(-|v)||H^"(-|x)) 

= j2D{W{-\v,)\\W{-\xi)) 

i=l 

because the channel is stationary memoryless. In addition, simple computa- 
tion yields 

Thus, we have 

i=l 

Then, the Fisher information matrix F^(x) in (6.7.3) becomes the diagonal 
matrix as follows: 





(* . 


o\ 


■Fn(x) = 


N 






[o 


pJ 



which implies ||F^(x)|| == — . Hence, the assumption (6.7.4) is clearly satis- 
fied. Next, we verify that the assumption (6.7.1) is also satisfied. Since the 
cost constraint on the AWGN channel is given by 

lc„(x) = ^{xl + xl + ---xl) < P, 

where x = (xi,X 2 , • • • ,x^) C 

—VnP < Xi < y/nP (i = 1, 2, • • • , n) 



must be satisfied. Hence, 



6.7 Identification Capacity and Resolvability of Continuous Input Channels 433 



X^{P) = |x € A"” I ^c„(x) < P 

is included in an n-dimensional hypercube Vn{P) with edge length ln{P) = 
2y/nP^ which shows that the assumption (6.7.1) is satisfied. That is, the 
AWGN channel satisfies both (6.7.1) and (6.7.4). By recalling Theorem 3.7.3 
and Theorem 3.7.4 (strong converse theorem), we obtain the following result 
from Theorem 6.7.2. 



Theorem 6.7.3. Suppose that W = {W} is the AWGN channel Then, for 
all € > 0, 6 > 0, p > 0 and A > 0 with 

(S<1, /i-hA<Cl, ^<1 

it holds that 



C,{e,P\W) = D{n,X,P\W) 

= S{6,P\W) = Cs{P\W) = Ilog(l + I). (6.7.28) 

Example 6.7.1. ^ Theorem 6.7.3 implies that for all /i > 0 and A > 0 with 
/i + A < 1 we have 

D(yu,A,P|W) = Cs{P\W) = ilog(l + ^), 
and for all S with 0 < ^ < 1 we have 

S{6,P\W) = Cs{P\W) = ilog(l + ^). 



Finally, let us consider the stationary additive but non-white Gaussian 
(ANWGN channel) W = also given in §3.7 as another example of 

the channels with A! = y = H. Since the probability transition density of 
is given by 



W^{y\x) - 



(27t)” det Vn 



exp 



-2(y-^)^n"'(y-x)^ 



(6.7.29) 



for any x G and y G simple computation shows that 
D{W^i • |v)|lW”( • |x)) = i(v - x)V;-i(v - x)T 



for any v,x G where Vn is the covariance matrix defined in (3.7.28). 
Then, the Fisher information matrix F^i(x) is calculated as 

F„(x) = 



Hence, the norm ||Fn(x)|| can be expressed as 



^ Recently and independently, this result was given also by Burnashev [13] 



434 6 Identification Code and Channel Resolvability 



IIK(x)|| 



)_ 

min 




(6.7.30) 



where Ni^\ ■ ■ ■ , are the eigenvalues of Vn (cf. (3.7.30)). On the 
other hand, from (3.7.41) in Lemma 3.7.3 we see that for any positive constant 
(5 > 0 there exists an n{5) such that 



1 ^ / pin) \ 

^ E L + i P C,(P|W) + <5 (Vn > n{5)), (6.7.31) 

i=l \ / 

where for i = 1, 2, • • • , n = max[Ap ^ 0] and Ap ^ is the number 

determined from (3.7.39). Therefore, it follows that 



1 ( 

— log(l + ^J <C,(P|W) + ^ (Vn>n(5); Vi = 1, 2, • • • ,n), 
which yields 

log ( j < 2n{Cs{P\W) + 5) (Vn > n((5); Vi = 1, 2, • • • , n). (6.7.32) 



Now, since > P from (3.7.40), (6.7.32) leads to 



log j < 2n(C,(P|W) + <5) (Vn > n(5); Vi = 1, 2, • • • , n). (6.7.33) 

Thus, we have 

p) > pg-2n(C.(P|W)+5) Vi = 1, 2, • • • , n). (6.7.34) 

Therefore, 

min • • • , p)) > Pe"2"(c»(^’IW)+5)_ (6.7.35) 

Substitution of (6.7.35) into (6.7.30) yields 

||Pn(x)|| < ^e'^^(CAP\w)+5) ^ (g_P 3 g^ 

which shows that the assumption (6.7.4) is satisfied. Furthermore, the as- 
sumption (6.7.1) is obviously satisfied because, as has been discussed for the 
case of the AWGN channel, we have —\/nP < Xi < y/nP under the cost 
constraint 

lc„(x) = -{xl + xj + ■ ■ ■ xl) < P, 
n n 

and therefore, 

A^”(P) = |x e d:'” I lc„(x) < p| 



6.8 Identification-Transmission Codes 



435 



is included in an n-dimensional hypercube Vn{P) with edge length ln{P) = 
2\/nP. 

By recalling Theorem 3.7.5 (strong converse theorem) and Lemma 3.7.4, 
we obtain the following result from Theorem 6.7.2. 

Theorem 6.7.4. Suppose that W = the ANWGN channel. 

Then, for all e >0, S > 0, /a > 0 and A > 0 with 

(£< 1 , ^<1 

it holds that 

Cs{e,P\W) = D{fi,X,P\W) 

= S{5,P\W) 

= Cs{P\W) 

where /(A) and g{\) are the functions defined in (3.7.44) 

spectively. □ 



Example 6.7.2. Theorem 6.7.4 implies that for all /i > 0 and A > 0 with 
/i + A < 1 we have 

D{fx, A, P|W) = C,(P|W) = log (l + ® ) d\, 

and for all 5 with 0 < (^ < 1 we have 

5(A, P|W) = C,(P|W) = T log (i + M) d\. 



6.8 Identification- Transmission Codes 

So far we have given the basic notion and theorems on identification coding 
together with the channel resolvability problem. If we combine the identifica- 
tion coding with the conventional transmission coding defined in Chapter 3, 
we can acquire a coding called the identification-transmission coding for a 
channel which is theoretically and practically of interest. This section is de- 
voted to description of the identification-transmission coding. Before giving 
formal definitions, let us intuitively explain the identification-transmission 
coding as well as its practical background. 

First, let us consider a communication system with one transmitter S 
and Nn receivers Li,L 2 , •••, given in Fig. 6.2 (here, the channel W 
is assumed to satisfy the strong converse property for the sake of intuitive 
understanding). The transmitter S chooses (a = 1, 2, • • • , A^^), who is one 



436 6 Identification Code and Channel Resolvability 

of the Nn receivers, in advance and tries to send a message to La- (However, 
S cannot simultaneously communicate with more than one receiver. We will 
treat such a simultaneous communication system in Chapter 7.) While all the 
receivers receive a channel output corresponding to the message sent by 5, 
no receiver knows in advance whether the transmitted message is to him/her 
or not. Therefore, it is necessary for the transmitter S to send not only the 
message m G {1, 2, • • • , Mn} to be transmitted but also the information a G 
{1, 2, ■ • • , Nn} indicating the receiver La to whom the message is transmitted 
(a is called the address of the receiver La)- 



decoder 




Fig. 6.2. 



One of the ideas for realizing such communication that first comes to us 
is as follows. The transmitter first sends the address a of the receiver La 
and then sends the message m to be sent both in encoded forms. Letting 
n be the number of times that the transmitter uses a channel W, in this 
communication method the transmitter uses W for the first an times to 
send the address a and for the following (1 — a)n times to send the message 
m, where a is some constant satisfying 0 < a < 1. Then, the channel coding 
theorem (Theorem 3.2.1) given in Chapter 3 tells us that, letting n be a 
sufficiently large integer, if we choose Nn and Mn such that 

-logAT^ -> aC, 

n 

-logM„^ (l-a)C, 
n 



( 6 . 8 . 1 ) 

( 6 . 8 . 2 ) 



6.8 Identification- Transmission Codes 



437 



we can realize the communication with the error probability arbitrary close 
to zero. Here, for simplicity, we assume that W is a stationary memory- 
less channel with the channel capacity C (the timesharing principle on the 
channel coding does not always hold for general channels). 

In this communication method based on the time-sharing principle, not 
only the specified receiver La but also the other receivers can know both the 
address a and the message m. However, since the objective of the commu- 
nication is realization of reliable transmission of the information from the 
transmitter S to the receiver La-, it is not necessary at all that the same 
information is correctly transmitted to the other receivers {h ^ a). That 
is, the other receivers {h ^ a) need to know neither a nor m (this prop- 
erty is often positively required in order to keep the communication secret). 
Then, it turns out that we can use the identification code described in §6.2 
for transmission of the address a instead of using the transmission code. If 
we use the identification code, the size Nn of the addresses to which we can 
transmit a message is written as 

- log log Nn — > aC (6.8.3) 

n 

instead of (6.8.1). This means that Nn can be quite large even if we use the 
channel W an times as before. In this case the size Mn of the messages that 
can be sent by using the transmission code is the same as (6.8.2). 

We note here that there is a trade-off between the rates of the identi- 
fication code for transmission of an address and the transmission code for 
transmission of a message. That is, if we make one large, we need to make 
the other small. Let us consider the following two special cases in order to 
clarify the trade-off. First, consider the case of Mn = 1. Since the transmit- 
ter does not need to send a message in this case, the transmitter can use the 
channel only for specifying an address. Then, Nn such that 

- log log Nn^ C (6.8.4) 

n 

can be chosen, which is greater than (6.8.3). Next, consider the case of Nn = 
1. Since the transmitter does not need to specify the address in this case, the 
transmitter can use the channel only for sending a message. Then, Mn such 
that 

-logMn^C (6.8.5) 

n 

can be chosen, which is greater than (6.8.2). Surprisingly, however, there 
exists a code that enables us to send both an address and a message si- 
multaneously with the rates given in (6.8.4) and (6.8.5), respectively. Such a 
code is nothing but the identification-transmission code defined below. This 
identification-transmission code also guarantees secrecy of the address and 
the message to be transmitted. 



438 6 Identification Code and Channel Resolvability 



Now, we are ready to define the identification-transmission code (in fact, 
the notion of identification-transmission code is essentially included in the 
notion of the “identification code” defined in §6.2). Let W = be an 

arbitrary general channel with arbitrary input and output alphabets X and 
respectively {X and y are not restricted to finite sets). The identification- 
transmission code for the channel W is defined as follows. We assume here 
that there is one transmitter S and there are Nn receivers Li, L 2 , • • • , 

First, let Mn = {1, 2, • • • , N^} denote an address set and Mn = {1, 2, • • • , M^} 
a message set. A transmitter chooses a deterministic encoder ipn : Nn x 
Mn in advance. If the transmitter wants to send a message m e Mn 

to a receiver La {a e Nn)^ the transmitter computes Ua,m = and 

injects Ua,m into W. On the other hand, the receivers prepare Nn x Mn 
subsets of called the decoding regions 



'Da,m (a = l,2, = (6.8.6) 

in advance satisfying 

Va,m^'Da,m' (Vm ^ m'; Va = 1, 2, • • • , iV„). (6.8.7) 

Notice here that 

(a = 1, 2, • • • , iV„) (6.8.8) 

m=l 

is not required to be disjoint. Letting y be an output of the channel, each 
receiver La (a = 1, 2, • • • , Nn) judges that the communication is intended for 
La if y G Va and not intended for La otherwise. When a receiver La judges 
that the communication is intended for La , La searches for the unique m such 
that y € Pa,m ^ Mn) and judges that the message m is transmitted, 
where the uniqueness of m is guaranteed by (6.8.7) and (6.8.8). The decoder 
of La is defined as the mapping : Va — ^ Mn expressing this operation. 
On the other hand, if the receiver La judges that the communication is not 
intended for La, La does nothing. That is. La is not interested in either the 
address or the message contained in the communication to other receiver. 
Here, 

pn = - log log Nn, Vn = - log Mn 
n n 

are the coding rates of the identification-transmission code for the addresses 
and the messages. We call {pm'f'n) the coding rate pair. Next, define 



{a = l,2,--- ,Nn), (6.8.9) 



m=l 

Mn. 



= M:J2 I ^n{b,m)) (a ^ 6). 



m=l 



( 6 . 8 . 10 ) 



6.8 Identification-Transmission Codes 439 

Here, is the error probability of the message when the receiver correctly 
judges that the communication is intended for La and the probability 

that La misjudges that the communication is intended for La when the com- 
munication is actually intended for L5 {b ^ a). In addition, since it follows 
from (6.8.8) and (6.8.9) that 

= ( 6 . 8 . 11 ) 

^ m=l 

the probability that La misrejects the communication intended for La is 
upper-bounded by Thus, we set 

IJ-n = max (6.8.12) 

l<a<Nn 

A„ = max (6.8.13) 

l<a#6<JV„ " ^ ’ 

and call fin and the error probability of the first kind and the error proba- 
bility of the second kind of the identification-transmission code, respectively. 

Remark 6.8.1. While /in^^ and defined by (6.8.9) and (6.8.10), respec- 
tively, are defined as the average error probability under the assumption that 
each message m is generated subject to the uniform distribution over a mes- 
sage set A4ri5 we can also consider the identification-transmission code with 
in (6.8.9) defined as the maximum error probability (see Remark 3.4.2) 

max W^{Vl \ipn{a,m)). (6.8.14) 

l<m<Mn 

In this case we have 

W^"(2?al‘^n(a,m)) (6.8.15) 

l<m<Mn 

instead of (6.8.11). Equation (6.8.15) means that the probability that the 
receiver misrejects the communication intended for La is upper-bounded by 
regardless of a message m G Mn- All the theorems given below still hold 
if we define by either (6.8.9) or (6.8.14). Here, one would like to replace 
Xn'^^ in (6.8.10) with 

W^{Va I rn{b,Tn)) (u 7^ 5). 

However, this replacement requires us to treat the problem in a completely 
different manner. □ 



For simplicity, let us call a pair of an encoder and a decoder 

with the address set and the message set of sizes Nn and M^, respectively, 
and the error probabilities of the first kind and the second kind /in and 



440 6 Identification Code and Channel Resolvability 



An, respectively, the (n, Nn, Mn^ fin, An)-code. Here, we set xjjn = {'ipn \ '^n\ 
• • • , Notice that, as is clear from the definition above, the (n, A^n, Mn, 

fin, An)-code is an (n, Nn,fin, An)-identification code with the address set Mn 
and, simultaneously, an (n, Mn, /in )-transmission code with the message set 
Mn if a = 1, 2 , • • • , Nn is arbitrarily fixed. 

The identification-transmission coding problem is usually formulated as 
finding a region of the rate pair (pn,'f'n) subject to the condition that there 
exists a pair {(fn,iin) of an encoder and a decoder causing fin and An less 
than given values. We require that fin and An satisfy the conditions 

limsup fin ^ fi, 

n— >co 

lim sup An < A 

n—^oo 

respectively for arbitrary fixed constants 0 < fi < 1 and 0 < A < 1, which is 
one of the requirements usually imposed on. We formulate the identification- 
transmission coding problem subject to these conditions in the following form: 

Definition 6.8.1. 

Rate pair {Rq,R) is (/i, A)-achievable 

There exists an (n, Nn, Mn, fin, An)-code satisfying 

lim sup //n < fi, lim sup An < A, lim inf — log log ATn = Rq and 

n-^oo n—^oo n—^oo fl 

lim inf — log Mn — R- 

n— )-oo Tl 



(6.8.16) 

(6.8.17) 



Definition 6.8.2 ((/it, A)-identification-transmission capacity region). 



Cd(/^,A|W) = {(Rq^R) I (Rq^R) is {fi^ X)- achievable} . 



Remark 6.8.2. Notice that in Definition 6.8.1 

lim inf — log log A/^n = Ro, (6.8.18) 

n— >oo n 

lim inf i log Mn — R (6.8.19) 

n-^oo n 

appear instead of 

lim inf - loglog A^n > Ro, (6.8.20) 

n-^oo n 

liminf — log Mn > R. (6.8.21) 

n— >oo Tl ' 

We used definitions in the form such as (6.8.20) and (6.8.21) instead of 
(6.8.18) and (6.8.19) in the transmission coding treated in Chapter 3 and the 



6.8 Identification-Transmission Codes 



441 



identification coding treated in this chapter. However, in the identification- 
transmission coding the definitions using (6.8.18) and (6.8.19) are more con- 
venient. Readers will see the convenience after giving the proof of the converse 
theorem. □ 



We first have the following direct theorem under the definitions given 
above. 

Theorem 6.8.1 (Direct theorem: Han and Verdu [45]). Let W be an 
arbitrary channel. Then, for all 0 < ja < X it holds that 

A|W) D {(Ro, i^) I 0 < Ro < < C(/i|W)} , (6.8.22) 

where C{fi[W) denotes the fa-channel capacity defined in ^3.4 (we set £ = fi 
here). □ 



Proof. We have only to construct a code similarly to the proof of Theo- 
rem 6.2.1. Here, we set € = fi and replace R = C{fi\W) —7 with an arbitrary 

R satisfying 0 < R < C(/i|W). Moreover, we replace liminf— logM^ > R 

n — >00 n 

with liminf — log Mn = R. We can show that any rate pair {Rq, R) satisfying 

n^oo n 

0 < Ro < R < C(/i|W) is (/i, A)-achievable by reducing the number of the 
codewords of the identification code with rate R in Theorem 6.2.1 down to 
rate Rq. Cl 

Next, we give the converse theorem. We need here the assumption that 
^ is a finite input alphabet. 

Theorem 6.8.2 (Converse theorem: Han and Verdii [45]). Let W be an 
arbitrary channel with a finite input alphabet A!. Then, for all fi > 0 and 
A > 0 satisfying /i -f A < 1 it holds that 

CD[fi. A|W) C {(Ro, i^) I 0 < Ro < < C{fa\W)} , (6.8.23) 

where C{fa\W) denotes the fi-channel capacity defined in ^3.4 (we set £ = fi 
here). □ 



Proof. Suppose that {Rq, R) is (/i, A)-achievable. Then, there exists an (n, Nn, 
Mn, fin, An)-code Satisfying 

liminf — log log = Rq, (6.8.24) 

n — >00 n 

lim inf — log Mn = R (6.8.25) 

n -^00 n 



442 6 Identification Code and Channel Resolvability 



and 



limsup/in < limsupA^ < A. (6.8.26) 

n^oo n— >oo 

Let {}Pn^'^n) denote the pair of the encoder and the decoder of the (n, Nn^ 
An)-code, where 'ijjn = \ i^n \ • • • , If we treat 

Va = (^n(a, 1), V^n(a, 2), • • • , (/?^(a, M„)) 



as an Mn-dimensional vector for each a = l,2,---,A^n? the Nn vectors 
vi, V 2 , • • • , must be distinct. This can be verified in the following way. 
Suppose that for some a ^ b. Then, it follows that 






m=l 

M, 



= ^'Zw^i'r’a\M<^,rn)) 



m=l 



-Jf'Yl I Vn{a,m)) 

^ m=l 

^ I /^n- 



Hence, we have 



limsup An > 1 - limsup/in, (6.8.27) 

n^oo n-^oo 

which contradict the assumption /i -h A < 1 due to (6.8.26). Therefore, all of 
the Nn vectors must be distinct. 

On the other hand, we notice here that there are at most (| distinct 

Mn-dimensional vectors all the components of which belong to Hence, 

Nn < {\xn^- 

must be satisfied. Then, we obtain 

1 111 
-loglogA^n < - log Mn + -lognd- -logloglA'I, 
n n n n 

which yields 

lim inf — log log Nn < lim inf — log M^. (6.8.28) 

We can establish i^o < by substituting (6.8.24) and (6.8.25) into (6.8.28). 
Furthermore, since (6.8.9), (6.8.12) and Definition 6.8.1 imply that for any 
fixed a G J\fn R is //-achievable as a rate of the transmission code for the 
channel W, R < C(//|W) follows from Definition 3.4.2 in Chapter 3. □ 



The combination of Theorem 6.8.1 and Theorem 6.8.2 immediately yields 
the following theorem. 



6.8 Identification- Transmission Codes 



443 



Theorem 6.8.3 (Fundamental theorem). LetW be an arbitrary chan- 
nel with a finite input alphabet A'. Then, for all /d > 0 and A > 0 satisfying 
fi < X and /i + A < 1 it holds that 

Cd{ii. A|W) - {(i^o, R)\^<Rq<R< C(/i|W)} . (6.8.29) 

Remark 6.8.3. This fundamental theorem differs from the result that first 
appeared in Han and Verdii [45] at the point that the former is imposed on 
the additional restriction Rq < Ron the right-hand side of (6.8.29). However, 
as far as we consider only the deterministic encoders J^n x -A^n 
defined above, we have the fundamental theorem given here (compare The- 
orem 6.8.3 with Theorem 6.8.8 concerning stochastic encoders <pn that will 
appear below) . □ 



If in Theorem 6.8.3 we consider the case of /x = A == 0, we have the 
following corollary on the identification-transmission coding with the error 
probabilities of the first kind and the second kind satisfying lim = 0 and 

n—^oo 

lim Xn = 0, respectively. This corollary includes the intuitive example given 

n— )-oo 

at the beginning of this section as a special case of Rq = R = C(W). 

Corollary 6.8.1. For an arbitrary channel W with a finite alphabet A:' we 
have 

C^(W) - {(Ro, R)\f^<Ro<R< C'(W)} , (6.8.30) 

where C(W) = (7(0 |W) is the channel capacity of W defined in Defini- 
tion 3.1.2 and we set Cd(W) = Cd(0,0|W). □ 



Let us consider here the case where all of the codewords (the outputs of 
the encoder (pn) satisfy cost constraint F. We denote by (^/^(/i. A, F|W) the 
(/i, A)-identification-transmission capacity region in this case. The following 
theorem can be easily verified by using the argument yielding Theorem 6.8.3. 

Theorem 6.8.4. Let W be an arbitrary channel with a finite input alphabet 
2^. Then, for all /j, > 0 and A > 0 satisfying /i < A and /i 4- A < 1 it holds 
that 

Cd(m, a, riW) = {{Ro, R)\0<Ro<R< r|W)} . (6.8.31) 

While in Theorem 6.8.3 and Theorem 6.8.4 the size of the input alphabet 
X is assumed to be finite, we can drop this assumption if the channel W 
satisfies the “stability” condition given in §6.7. Let us consider the additive 
white Gaussian noise channel (AWGN channel) defined in §3.7 as an example. 
Denoting by N and P signal power and noise power, respectively, we define 
the identification-transmission code (with cost constraint) for the AWGN 
channel as the code such that all the outputs \ia,m = from the 



444 6 Identification Code and Channel Resolvability 



encoder ^pn satisfy cost constraint P (see §3.7). Letting A, P|W) denote 
the (/i, A)-identification-transmission capacity region for the AWGN channel 
with signal power P (see Definition 6.8.2), we have the following theorem. 



Theorem 6.8.5. Let W he the AWGN channel Then, for all /a > 0 and 
A > 0 satisfying < X and fi X < 1 it holds that 



Ci)(/i,A,P|W) 



{Ro,R) 



0 < Ro < R < - log 




(6.8.32) 



Proof 

1) Direct part: 

In this part of the proof we do not use // + A < 1 (we only use /i < A here). 
Since Theorem 3.7.4 guarantees that the AWGN channel satisfies the strong 
converse property, in view of Theorem 3.7.3 and Remark 3.7.1 we have 

Csiix, P|W) = C,(P|W) = i log (l + (0 < Vm < 1), (6.8.33) 



where Cg(/U, P|W) denotes the (/i, P)-cost channel capacity of the channel 
W (see Definition 3.6.4). Therefore, by noticing p < X we can obtain 



Cz^(/i,A,P|W)D 



(Ro,R) 



0<Ro<R<-log 




from Theorem 6.8.1 (more precisely, the extended version of Theorem 6.8.1 
to the case with cost constraint). 



2) Converse part: 

In this part of the proof we use /i -f A < 1. First, suppose that {Rq, R) is 
(/i, A)-achievable as a rate pair of the identification-transmission code. Then, 
there exists an (n, Mn, A^)- identification-transmission code satisfying 



lim sup < /a, 

n— >cx) 


(6.8.34) 


lim sup An < A, 

n— »oo 


(6.8.35) 


lim inf — log log == Ro^ 

n— >oo n 


(6.8.36) 


lim inf — log = R, 

n—^oo n 


(6.8.37) 



and cost constraint P. Let x Mn — > be the encoder of this code, 

where Nn = {1, 2, • • • , iV^} and Mn = {1, 2, • • • , M^}. Next, we quantize 
probability distributions as was shown in §6.7. That is, in §6.7 we partitioned 
the n-dimensional hypercube Vn{P) (we replace P with P here) depending on 
the cost constraint P into kn{P) small hypercubes An^ (i = 1, 2, • • • , kn{P)), 
defined a representative point in each hypercube An^ and set 



6.8 Identification-Transmission Codes 



445 



= {ui,U2, • • • (6.8.38) 

(see (6.7.9)). Now, we define another deterministic encoder (^* : A/*n x Mn — > 
such that for each (a, m) G Afn x Mn, ^a,m = 

where u;. is the representative point of the small hyper cube including 
Ua,m- We next define the (n, Nn, Mn, IJ>n^ A^)-identification-transmission code 
as the identification-transmission code with the encoder (^* instead of (pn and 
the same decoding regions as (n, A^) -identification-transmission 

code. Then, it follows from Lemma 6.7.1, (6.8.34) and (6.8.35) that 

limsup^^ < fi,, 

n-^oo 

limsupA^ < A. 

n — »’00 

We can now establish Rq < R by noticing 
lim sup — log log kn{P) = 0 

n— »oo ^ 

(see (6.7.8)) and /i + A < 1 and using (6.8.36) and (6.8.37) in the same 
way as the proof of Theorem 6.8.2 (more precisely, the extended version of 
Theorem 6.8.2 to the case with cost constraint). In the proof, however, we 
use lZn{P) instead of On the other hand, 

R < C,(m,P|W) = Cs{P\W) = llog (l + ^) (0 < V/i < 1) 

immediately follows because R is /i-achievable as a rate of the transmission 
code. □ 



(6.8.39) 

(6.8.40) 



Remark 6.8.4. The proof of Theorem 6.8.2 on the identification-transmis- 
sion coding, in which we need not to consider the relationship to the channel 
resolvability problem, is much easier compared with the proof of the converse 
theorem on the identification coding given in Lemma 6.4.1. This is because 
the identification-transmission code has a particular structure related to the 
transmission code when the identification-transmission code is treated as the 
identification code. □ 



Remark 6.8.5. Note that the fundamental theorem on the identification- 
transmission coding (Theorem 6.8.3) does not require the “strong converse 
property” of a channel, which is different from the fundamental theorem on 
the identification code (Theorem 6.4.2), which holds only under the assump- 
tion of the strong converse property of the channel. In this sense, we can 
regard the identification-transmission code as more general than the identi- 
fication code. □ 



446 6 Identification Code and Channel Resolvability 



Remark 6.8.6. In the identification coding treated in §6.2 it is essential to 
generate a channel input subject to the probability distribution Qi over 
when i G Mn is transmitted. However, in order to generate the channel in- 
put we need to prepare quite a lot of random number generators (stochastic 
encoders (^n)- Iri f^ict, letting n denote the block length, we need to prepare 
Nn — random number generators that grow as a double-exponential 
function of n. On the other hand, in the identification-transmission coding 
we do not need such random number generators because the encoder is de- 
terministic. Nevertheless, we can consider that the identification-transmission 
code includes a random number generator in some sense. That is, since the 
message m G Mn is a random variable subject to the uniform distribution 
over A4n, the output == from the encoder of the identification- 

transmission code is also a random variable for each address a G A/*n, and 
therefore, the probability distribution Qa of plays the role of the code- 
word in the identification coding. We can see from this viewpoint that the 
encoder of the identification-transmission code does not require any auxiliary 
random number generator but adequately makes use of the “built-in” ran- 
domness contained in the message. This randomness, however, cannot yield 
randomness more than itself (see Remark 2.1.3 in Chapter 2). This is why 
the restriction Rq < R appears on the right-hand side of (6.8.29). □ 



Remark 6.8.7. As can be seen from the proofs of Theorem 6.2.1 and 
Theorem 6.8.1, we construct the identification code and the identification- 
transmission code based on an (n, M^, s:rj,)-transmission code determined in 
advance. Readers interested in the constructions using an algebraic code (such 
as the constant weight code) or a hash function should see Verdii and Wei 
[93] and Kurosawa and Yoshida [60]. □ 

Remark 6.8.8. We can modify the identification-transmission code in such 
a way that receivers have message sets Mn oi different sizes. That is, the 
rates R for respective receivers can be different from one another. Under this 
modification we consider Nn rates R^^\ R^‘^\ where Nn is the 

number of receivers. □ 



Remark 6.8.9. So far we have assumed that the message Jn is subject to 
the uniform distribution over a message set Mn- This assumption is nec- 
essary so as to keep the probability of the misacceptance small. However, 
even for the case that Jn is not subject to the uniform distribution, we can 
also define a reliable identification-transmission code in the following way. 
After preparing an adequate set Mn (which substitutes for A4n), consider 
an adequate “stochastic” transform (j)n ' Mn Mn such that 

M^J^^ = {me Mn I Pr {0n(^) = m} > 0} (Vm G Mn), 



6.8 Identification-Transmission Codes 



447 



each of which is a subset of are disjoint and (j)n{Jn) is subject to a 
probability distribution nearly equal to the uniform distribution. Then, we 
can define a reliable identification-transmission code with Mn as the new 
message set in the same way as before. Notice here that an original message 
m G Mn is uniquely reproduced from m G Mn- □ 



Now, let us consider the case where the encoder (^n x Mn — ^ of 

the identification-transmission code can be stochastic. That is, we consider 
the case that for each (a, m) G Afn x Mn the encoder (pn randomly generates 
a codeword subject to a probability distribution Qa,m over dependent 
on (a,m) (see §6.2). (The special case of \Mn\ = 1 yields the identification 
code itself defined at the beginning of this section.) Let C|j(/i, A|W) denote 
the (/i, A)-identification-transmission capacity region (see Definition 6.8.2) 
of this stochastic case. Then, we have the following theorem corresponding 
to Theorem 6.8.1. This theorem means that the identification-transmission 
capacity region is expanded if we permit a stochastic encoder. 

Theorem 6.8.6 (Direct theorem). LetW be an arbitrary channel. Then, 
for all >0 and A > 0 with fi < \ it holds that 

C^(/i, A|W) D {{Ro,R) \ 0<Ro< C(/x|W),0 < R < C(/i|W)}. (6.8.41) 



Proof. Since the achievability for the case oi 0 < Rq < R < C(/i|W) is 
clear from Theorem 6.8.1, we have only to consider the case of 0 < < 

Rq < C(/i|W). First, we notice that Rq is /i-achievable as a rate of the 
transmission code for the channel W due to Rq < C(/i|W). Thus, there 
exists an (n, /in)-fransmission code satisfying 



limsup luLn < /a, 

n—^oo 

lim inf — log Ln = Rq- 

n— >oo n 

Denote by 

C„ = {ui,U 2 ,---,UL„} (Ufc G A:’"') 



(6.8.42) 

(6.8.43) 



the set of codewords of this code and Uk the decoding region for u;. for each 
k = 1, 2, • • • , Ln- We construct Nn subsets Ai,A 2 , • • • , An^ from Cn in the 
same way as the proof of Theorem 6.2.1. Then, we have 

liminf- logical = Rq {a = l,2,---,A^n) (6.8.44) 

n-^oo n 

and 



lim inf — log log Nn = Rq. 

n— »oo n 



(6.8.45) 



Next, for each a = 1, 2, • • ■ , we partition Aa into Mn disjoint subsets Aa,m 
{m = 1, 2, • • • , Mn) of size \Aa\/Mn satisfying 



448 6 Identification Code and Channel Resolvability 



Mr, 

Aa = U (a = 1, 2, • ■ ■ , Nn), (6.8.46) 

m=l 

where is assumed to satisfy 

lim — log Mn = R. (6.8.47) 

n-^oo n 

Since we are considering the case of Rq > R, due to (6.8.44) and (6.8.47) 
such a partition is always possible. Now, set 

^a,m ~ ? 

keAa,m 

Ra^m — ^k’) 

keAa,m 

M-n. 

Ra — Ra^m 
k=l 

and denote by Qa,m the uniform distribution over 5o,m- We define a stochas- 
tic encoder (pn ' J^n x Mn — > by (pn{a,m) = Qa,mi where J\fn = 

{1, 2, • ■ • , 7V^} and Mn = {1, 2, • • • , M^}. Then, we can define an (n, Nn, Mn, 
/i^, An)-identification-transmission code with the stochastic encoder (pn and 
the decoding regions determined by (6.8.49) and (6.8.50). We can prove that 
this identification-transmission code satisfies (6.8.42) and, in addition, satis- 
fies 



(6.8.48) 

(6.8.49) 

(6.8.50) 



lim sup An < A 

n^oo 

by using the assumption /i < A in the same way as the proof of Theorem 6.2.1. 
By recalling (6.8.45) and (6.8.47), we can conclude that (Ro,R) is (/x,A)- 
achievable as a rate pair of the identification-transmission code. □ 

Next, we give the converse theorem on the identification-transmission code 
for the case that the stochastic encoder is permitted. In order to establish 
the converse theorem, we need the strong converse property of a channel and 
the finiteness of the input alphabet that are not needed in Theorem 6.8.6. 

Theorem 6.8.7 (Converse theorem). Let W be an arbitrary channel 
with a finite input alphabet Af satisfying the strong converse property. Then, 
for all /a >0 and A > 0 with /x + A < 1 it holds that 

C^)(/i,A|W) C {(i7o,i^) \ 0<Ro< C(W),0 <R< C(W)}, (6.8.51) 

where C(W) denotes the channel capacity ofW defined in Definition 3.1.2. □ 



6.8 Identification-Transmission Codes 



449 



Proof. Suppose that (i?o? R) is (/x, A)-achievable as a rate pair of the identifi- 
cation-transmission code. Then, there exists an (n, Nn, Mn, fjin^ A^) -identifica- 
tion-transmission code satisfying 

lim sup fin < /^, 

n— >oo 

lim sup An < A, 

n—^oo 

lim inf — log log Nn = Roj 

n-^oo n 

lim inf — log Mn = R- 

n-^oo 71 

Since Rq is (/i, A)-achievable as a rate of the identification code, we have 
Ro < C(W) by using Corollary 6.4.1 on the identification capacity together 
with the strong converse property of the channel, the finiteness of the input 
alphabet and /i 4- A < 1 which are assumptions of the theorem. 

Next, we use Lemma 3.8.2 in §3.8 for establishing R < C(W). Let 

= UMr, i^he random variable subject to the uniform distribution over 
the message set Mn — {1? 2, • • • , Mn} and define a source as V = 

Then, Lemma 3.8.2 tells us that Theorem 3.5.1 in Chapter 3 (strong converse 
theorem) holds for the case that the encoder is stochastic (we use Lemma 3.8.2 
instead of Lemma 3.2.2 in the proof of sufficiency of Theorem 3.5.1). This 
means that Remark 3.5.1 in Chapter 3 is still valid if the encoder is stochas- 
tic. Then, R < C(W) follows by noticing that R is /i-achievable {0 < fi < 1) 
as a rate of the transmission code. □ 

The combination of Theorems 6.8.6 and 6.8.7 immediately yields the fol- 
lowing theorem. 

Theorem 6.8.8. Let W be an arbitrary channel with a finite input alphabet 
X satisfying the strong converse property. Then, for all fi > 0 and A > 0 
satisfying fi < X and /i + A < 1 it holds that 

C^(/i, A|W) = {{Ro^R) \ 0<Ro< C(W),0 <R< C(W)}, (6.8.52) 

where C(W) denotes the channel capacity ofW defined in Definition 3.1.2. □ 



Here, denote by C|^(/i, A, TIW) the (//, A)-identification-transmission ca- 
pacity region for the case that the encoder (pn is stochastic and all of the 
outputs from pn satisfy cost constraint P. The following theorem is easily 
obtained by using the same argument yielding Theorem 6.8.8. 

Theorem 6.8.9. Let W be an arbitrary channel with a finite input alphabet 
X satisfying the strong converse property under cost constraint P. Then, for 
all fi>0 and A > 0 satisfying fi<X, fi-\-X<l it holds that 

C|,(/i, A,r|w) - {{Ro,R) \o<Ro< c,(r|w),o <R< Cs(r\w)} . 

(6.8.53) 



450 6 Identification Code and Channel Resolvability 



Example 6.8.1. Any stationary memoryless channel W with cost con- 
straint whose input alphabet X and output alphabet 3^ are finite sets satisfies 
the assumption in Theorem 6.8.9 (see Theorem 3.7.2). Thus, we have (6.8.53) 
for this W. □ 



Now, let us consider the identification-transmission code for the AWGN 
channel again as an example of a channel whose input alphabet is not a finite 
set. We consider here the case where the encoder can be stochastic. Denot- 
ing by CJ,(/i, A, P|W) the (/i, A)-identification-transmission capacity region of 
the AWGN channel with signal power P (see Definition 6.8.2), we have the 
following theorem corresponding to Theorem 6.8.5. 



Theorem 6.8.10. Let W he the AWGN channel. Then, for all /a > 0 and 
A > 0 satisfying /i < A and /x + A < 1 it holds that 



CB(/i,A,P|W) 



(Po,^) 



0 < Ro < -^og 




0 < P < - log 
- - 2 ^ 




(6.8.54) 



Proof By noticing (6.8.33) and /i < A, it turns out that the direct part is 
included in Theorem 6.8.6 (extended to the case with cost constraint). On 
the other hand, since the AWGN channel satisfies the strong converse prop- 
erty under each cost constraint P, we can prove that the converse part in 
the same way as the proof of Theorem 6.8.7 (extended to the case with cost 
constraint), considering Theorem 6.7.2 and (6.8.33). □ 



We conclude this section by mentioning another phase of the identification- 
transmission coding system as a cryptographic system {secure communication 
system). As was given at the beginning of this section, the identification cod- 
ing consists of one transmitter S and receivers Li, P 2 ? • • * , For each 
a = 1, 2, • • ■ , Nn the receiver La decodes an address a and a message m from 
a channel output y G by using his/her decoding regions 

= {Ra,m | m = 1, 2, • • • , M^} . 

If each receiver keeps Ca secret from the other receivers, the other receivers 
cannot know either of a or m. 

We can make the communication system more secure in the following 
way. Letting Mn = {1,2, • • • ,Mn} be a message set, each receiver La ad- 
equately chooses a permutation Ga over Adn and keeps Ga secret from the 
other receivers. When the transmitter S communicates with the receiver La-, 
S operates the permutation Ga to a message m G A4n to be transmitted and 
sends Ga{m) in encoded form. On the other hand, the receiver La reproduces 
a message m' based on his/her decoding regions Ca from the channel output 



6.8 Identification-Transmission Codes 



451 



y G and computes m = by operating the inverse permutation 

a~^ to m'. The receiver La judges that rh is transmitted from S. If we 
use the identification-transmission code with such permutations of respective 
receivers, we can keep the message more secure, though the secrecy on the 
address is not improved. In fact, even for the case that the decoding regions 
Ca are revealed to the other receivers, the message is still secure as long as 
Ca keeps the permutation a a secret. 

While the identification-transmission code is suitable for keeping the com- 
munication secure for the case that many users share one channel, such a 
secure communication cannot be realized by the usual transmission code. 



7 Multi- Terminal Information Theory 



7.1 What Is Multi- Terminal Information Theory? 

In the preceding chapters excluding Chapter 2 and Chapter 4 treating ran- 
dom number generation and hypothesis testing, respectively, we have given 
results on fundamental problems how we can realize effective compression or 
transmission of data for coding systems with one encoder (pn and one decoder 
'ipn- We call the theory dealing with such coding problems the single-user in- 
formation theory. 

On the other hand, we can also consider coding systems with more than 
one encoder or decoder. These kinds of coding systems appear when we model 
situations where many users simultaneously share one data compression sys- 
tem or channel. For example, such situations occur when we broadcast to 
many earth stations via one communications satellite or many communica- 
tion satellites communicate with one earth station (see Fig. 7.1). We call 
the theory treating these kinds of coding systems with multiple encoders or 
multiple decoders the multi-user information theory or the multi-terminal 
information theory. 




Fig. 7.1. 



454 7 Multi- Terminal Information Theory 

This chapter is devoted to observations on the two most fundamental, 
but the most important, coding systems in the multi-terminal information 
theory. One is on the source coding. Given outputs of correlated multiple 
sources, each encoder separately encodes one of the source outputs into a 
codeword and transmits the codeword to a center. The center reproduces 
the original source outputs simultaneously from the transmitted codewords. 
Such a coding system is called the Slepian- Wolf coding system. The other is 
on the channel coding. Multiple senders, each one of whom is far from the 
others, separately encode information into respective codewords and transmit 
the codewords through a channel. One receiver reproduces the information 
from the multiple senders from the transmitted codewords. Such a system is 
called the multiple- access channel We first formulate the Slepian- Wolf coding 
system in the following section. Formulation of the multiple- access channel 
is given in §7.6. We will find a beautiful “duality” between these two coding 
problems. 



7.2 The Slepian- Wolf Source Coding System 



source 1 



encoder 1 



decoder 




source 2 



encoder 2 



Fig. 7.2. 



Let a pair of correlated sources (Xi, X 2 ) = "^2 )}^i given. Here, 

we assume that the probability distribution Px^x^ is arbitrary and and X 2 
are countably infinite source alphabets of sources Xi and X 2 , respectively. 
The fixed-length source coding problem for the pair of correlated sources 
(Xi,X 2 ) is formulated as follows. We first define the two sets of integers 

= (7.2.1) 

^ (7 2 . 2 ) 

called the codes. Next, we choose arbitrary mappings — > A4n^ 

{encoder 1) and : X^ — > {encoder 2), where (pn^ (resp. ^Pn^) maps 



7.2 The Slepian-Wolf Source Coding System 455 



an output xi G A'f (resp. X 2 G from the source Xi (resp. X 2 ) to an 
element of Mn^ (resp. Mn^) (Fig. 7.2). We call 

rW = ilogM^, (7.2.3) 

^ = ~ log {7. 2 A) 

the coding rates of the encoder 1 and the encoder 2, respectively. On the other 
hand, a decoder '0^ • x — > A'f x A'-f receives two outputs 

and (fn\'x. 2 ) from the two encoders and tries to reproduce the pair of original 
source outputs (xi, X 2 ) (Fig. 7.2). In such a case the error probability Sn is 
defined by 

= Pr {(Xr,XJ) ^ • (7.2.5) 

Note here that (encoder 1) cannot see the output from the source X 2 
and (encoder 2) cannot see the output from the source Xi (such coding 
is called the separate coding). We simply call the triplet {(pn\ ^n\ '(pn) of two 
encoders and one decoder with the two codes in (7.2.1) and (7.2.2) and the 
error probability Sn the {n, Mn\ Mn\ 6n)-code. 

In this coding system we have the two coding rates given by (7.2.3) and 
(7.2.4). We consider the problem of making the coding rates rn^ and rn^ 
as small as possible subject to the requirement that the error probability Sn 
satisfies ^ 0 as the block length n — > 00 . In general, however, there is a 
“trade-off” between the two coding rates due to the correlation between Xi 
and X 2 . That is, making one coding rate small leads to making the other 
large. Thus, we formulate the problem in such a way that we find a region on 
the pair of the coding rates (rn\rn^) that enables us to find a coding with 
Sn — > 0 as n — > 00 . This formulation is formally described in the following 
definitions. 

Definition 7.2.1. 

Rate pair (Ri,R 2 ) is achievable 

<4^ There exists an {n, Mn\ Mn\ Sn)-<^ode satisfying lim Sn = 0, 

n— »oo 

lim sup — logM^^^ < Ri and lim sup — logM^^^ < R 2 . 

n — >00 77- 

Definition 7.2.2 (Achievable rate region). 

R/(Xi,X 2 ) = {(^ 1 , 7 ^ 2 ) I (^ 1 ,-^ 2 ) is achievable}. (7.2.6) 

This two-dimensional region R/(Xi,X 2 ) is called the achievable rate region. 
The objective of this section is determination of this achievable rate region. 
To this end, we define the following three key quantities: 



456 7 Multi- Terminal Information Theory 



F(Xi|X 2 )=p-limsUp log rvnlXnV 


(7.2.7) 


iJ(X 2 |Xi)-p-limSUp log fVnlVnV 

n-^oo Tl (^A2 |A j j 


(7.2.8) 


ii-(XiX 2 ) - p-hmsup log 

n-^oo Tl , A 2 ) 


(7.2.9) 


where these quantities are called the spectral (conditional) sup-entropy rates, 
and denote by 7^(Xi,X2) the set of all satisfying 


i?i >F(Xi|X2), 


(7.2.10) 


i?2 > S^(X2|Xi), 


(7.2.11) 


Ri + R 2 >H(XiX2). 


(7.2.12) 


Then, we have the following quite general theorem. 
Theorem 7.2.1 (Miyake and Kanaya [67]). 




i^/(Xl,X2) = 7^(Xl,X2). 


(7.2.13) 


Remark 7.2.1. Among the inequalities (7.2.10)-(7.2.12) that determine 
7 ^(Xi,X 2) on the right-hand side of (7.2.13), (7.2.12) can be redundant if 
(Xi, X 2 ) is general, which is different from the cases that (Xi, X 2 ) is station- 
ary and memoryless or stationary and ergodic. In addition, the timesharing 
principle does not always hold if (Xi,X 2 ) is general. □ 



Proof of Theorem 7.2.1. 

1) Direct part: 

We prove the direct part by using the following lemma that is a multi-user 
version of Lemma 1.3.1 in Chapter 1. 



Lemma 7.2.1. Let Mn^ and Mn^ he arbitrarily given positive integers. 
Then, for all n — - there exists an (n, Mn\ Mn\ Sn)-code satisfying 



Sn < Pr< - log 






> llogMW 



or — log 



n Px^\x^(X^\X^) n 



> 1 log - 



7 



Px^^x^{X^,X^) 



>llog(MWM(2))-^|+3e-"T', 



(7.2.14) 



where j > 0 is an arbitrary constant. 



7.2 The Slepian-Wolf Source Coding System 457 



Proof. Define the three sets C Aff x by 



= |(Xi,X2) 
= |(xi,X 2 ) 



2 ^( 12 ) 



|(xi, 



-log- — I — T<-logMW- 7 l, (7.2.15) 

- log < - log _ 1 (7 2 . 16 ) 

n Px,"|xj*(x2|xi) n j ' 



X2) 



“ 3 7 T < 

n Of7xy(xi,X2) n 



i log(M^i^M^2)) _ 

( 7 . 2 . 17 ) 



and set 

T„ = rW n Ti^'> n (7.2. 18 ) 

Then, ( 7 . 2 . 14 ) can be written as 

< Pr {X^X^ i Tn} + 3 e-^^. ( 7 . 2 . 19 ) 

Therefore, it is sufficient to show the existence of an (n, Mn^ \ £7i)-code 
satisfying ( 7 . 2 . 19 ). To this end, we set 



= {l, 2, 

and use the random coding argument. 



a) Random coding: 

For each output Xi G A’f from the source Xi, we generate i\ G Mn^ 
randomly subject to the uniform distribution and define Tpn\xi) = i\ {Tpn^ 
is a random encoder). Similarly, for each output X2 G from the source 
X2, we generate %2 G randomly subject to the uniform distribution and 
define ^i^^(x2) = %2 is also a random encoder). 

b) Decoder: 

Suppose that a decoder 'ip^ : Mn^ x x X2 receives a pair 

of the outputs (21,^2) C A 4 n^ x from the two encoders Tpn^ and '^n\ 

If there exists a unique (xi,X2) satisfying lpn\'x.i) = ii, Tp^n\^2) = h and 
(xi,X2) G Tn, we define the decoder by '07^(n,i2) = (xi,X2) for such (xi,X2). 
If there exists no such (xi, X2) or exist more than one such (xi, X2), we define 
'0n(H5^2) as an arbitrarily specified element in Xi x A:'^. 

c) Evaluation of the error probability: 

Define the error events En ^ , En ^ , En^ and En^^^ by 



458 7 Multi- Terminal Information Theory 



= {3xi + XI : and (xl.XJ) € T„} , 

£^2) = |3x' ^ X 2 ” : ^i^)(x') = and (Xf,x') € T„} , 

£(12) ^ |3x; ^ xr, 3x' ^ X2” : vii>(x;) = ^^i>(xn, 

^^ 2 ) = and (xi,x^) € T„| . 

Then, the expectation of the error probability with respect to the random 
code 

£„ ^ Pr {(xr, X 2 ") # ?„(^(i)(XD,^( 2)(X2”))} 
is upper-bounded as follows: 

£„ = Pr {£(°) U £(i) U £(2) U £( 12 )} 

< Pr {£(°) } + Pr {£(1) } + Pr {£( 2 ) } + Pr {£( 12 ) } . (7.2.20) 

Hereafter, we evaluate these terms. First, it clearly holds that 

Pr {£W} = Pr{Xi”XJ i T„} . (7.2.21) 

Next, we evaluate Pr in the following way: 

Pr {£(1)} = Pr {3x; ^ ; ^(D(xi) ^ Tp^^\X'l) and (xi,X 2 ”) € T„} 

= ^ ■Px;‘X^(xi,X2) 

(xi ,X2)GA'^ X A’^'- 

X Pr |3xi ^ xi : '^^^^(x'l) = ^^^^(xi) and (xi,X 2 ) € T„| 



= XI ■Px7Xj(xi,X2) 

(xi,X 2 )GA{i X 

X Pr{^^i)(xi) =^W(xi)} 

x' 7^xi:(x'^,X2)GTri 



X ^X7Xj(xi,X2) X 

(xi,X 2)GA{1'X A^i x' 7 ^xi:(x'^,X2)GT„ 



< 



X ^"x7X~(xi,X2) X 

(xi,X 2)GA^X A^i Xj:(x'^,X2)€T„ 



1 



= X ^"X7Xj(xi,X2)|£„(X2)|-^, 

(xi,X2)GA-xA2^i ^ri 



(7.2.22) 



where 



7.2 The Slepian-Wolf Source Coding System 459 



5„(x2) = {xleA:-f|(x;,X2)€r„}. 

On the other hand, since (7.2.15) guarantees 

gn7 

Pxn|xy(xi|x2) > 

for (x'i,X2) C T^, it holds that 

1> -Px;‘|X"(x'i|x2) > 1S'„(X2)|-^. 

x'i€S„(x 2) 

Hence, we obtain 

|5n(x2)l < (7.2.23) 

By substituting (7.2.23) into (7.2.22), it follows that 

Pr{pW}< ^’xrx?(xi,X2)e-”^ <e-”^. (7.2.24) 

(xi,X2)€A'{^x 

By following the same lines with using the property (7.2.16), we obtain 
Pr{p®| < e-"'!'. (7.2.25) 

Furthermore, we can similarly obtain 

(7.2.26) 

as well from the property (7.2.17). Summarizing, we have 
6n<FT{X^X^ ^Tn} + Se-^\ 

which guarantees the existence of at least one deterministic (n, Mn^- , , Sn)- 

code satisfying (7.2.19). □ 



Now, we are ready to prove the direct part. Consider an arbitrary {Ri, R 2 ) 
satisfying the inequalities (7.2.10)-(7.2.12) and define 






n(i?i+27) 



j^(2) _ gn(i?2+27) 



(7.2.27) 

(7.2.28) 



for an arbitrarily small constant 7 > 0. Then, Lemma 7.2.1 guarantees the 
existence of an {n, Mn\ Mn\sn)-code satisfying 



^1 1 

< Pr< - log rvn\vn\ 



> i?l + 7 






> j?2 + 7 



460 7 Multi- Terminal Information Theory 



or — log 






> Ri H" R 2 “f 37 / + 3e 






> -Ri + 7 



+ Pr < - log 

r 

+ Pr|Ilog 



1 



1 



Px^xsiXY,X^) 

1 



^ > R2 + 7j 

> Ri H“ R 2 + 37 ^ + 3e 



+ p4-iogp^„i^„(^„l^„) 



+ Pr<j - log 



1 



n ^Px^x^{X^,X^) 



>F(Xi|X2) + 7| 
>S^(X2|Xi)+7 
>F(XiX2) + 37|+3e-"T' 



(7.2.29) 



We notice here that all of the four terms on the right-hand side of (7.2.29) 
converge to 0 as n — > 00 due to the definitions of the spectral (conditional) 
sup-entropy rates. Therefore, the error probability satisfies Sn ^ 0 as n 00 . 

In order to complete the proof, we choose a sequence {7^} satisfying 
7i > 72 > • • • > 0 and 7fc 0 as /c — > 00 and repeat the argument above in 
the order of 7 = 7i,7 = 72, • ■ This establishes the claim that (i^i,i?2) is 
achievable (the diagonal line argument see the proof 1) of Theorem 1.8.2 in 
Chapter 1). 



2) Converse part: 

Since we consider the general sources, the Fano inequality does not give a 
sufficiently tight upper bound. Thus, for proving this part we use the following 
lemma that is a generalized version of Lemma 1.3.2 in Chapter 1 to the multi- 
user case. 



Lemma 7.2.2. Any {n^ Mn\ Mn\ £n)-code satisfies 



£n > Pr< - log 



n Px-|x-(^rl^2) ^ 



> - log ' 



1 



1 






or — log 



1 



n n 



> - + 7 > - 3e-"T' 



(7.2.30) 



7.2 The Slepian-Wolf Source Coding System 461 



for all n — 1, 2, • • •, where ^ > 0 is an arbitrary constant. 

Proof. Denote the encoders of an {n, Mn\ Mn\en)-code by (pn^ and (pn^ 
and the decoder by 'ipn- If we define the four subsets Tn\Tn^\Tn^‘^\Sn C 
A'f X by 



= |(xi,X 2 ) 
= |(xi,X 2 ) 
= |(X1,X2) 



“ P 7 — i — V - “ + 7 I , 

n Px-|X 2 -(xi|x 2 ) n j 

- log — i — r > - logM^^) + 7 1 , 

n Cxy|xi'(x2|xi) n ” j 



log- 



> llog(MWM^2))^ 



7 



n Cxi"xy(xi,x 2 ) “ n 
s„ = |(xi,X2) (xi,X2) = V'n(¥’i^^(xi),v?^^^(x2))| , 
and set 

T„ = TWuTi2)uTii2), 

(7.2.30) can be written as 

£„ > Pr € Tn} - 3e-”'>'. (7.2.31) 

Hereafter, we prove (7.2.31). It clearly follows that 

Pr {Tn} = Pr [Tn n 5^} + Pr {T„ n 5„} 

<Pr{5^} + Pr{T„n5„} 

= £„ + Pr [T„ n Sn] 

< £„ + Pr n 5„} + Pr {ri^) n 5„} + Pr ^ 

(7.2.32) 

e~'^y m 

On the other hand, by using ■Px"|xi‘(xi|x 2 ) < — 7^ for (xi,X2) e rA we 

MA ^ 

have 

Pr{rWn5'„|= ^ ■Pxj(x2)Pxj‘|X"(xi|x2) 

(xi,X2)€TA^’nS„ 

- 12 Cxj(x2) 



(xi,x2)eTA^'n5„ 



< J2 ^xy(x2) 

(xi,X2)GS'ri 






e-H7 



ri7 



= E Px?i^2)\Si^H^2)\ 



X2 



462 7 Multi- Terminal Information Theory 



where 

S'^^)(X2) = {xi € -Yf I (xi,X2) € 5 „} . 

By using |54^\x2)| < Mn\ we obtain 

Pr{rWn5„}< ^x?(x2)e-”^ = e-"^ 



(7.2.33) 



X26X" 



Similarly, we obtain 



Pr 









(7.2.34) 

Pr n 5„} < (7.2.35) 

Consequently, in view of (7.2.32) and (7.2.33)-(7.2.35), we have 
Fi{Tn}<Sn + 3e-^\ 

which yields the claim of this lemma. □ 

Now, we can prove the converse part. Suppose that (i^i, i^2) is achievable. 
Then, there exists an {n, Mn\ Mj^\en)-code satisfying 



lim sup — log < Ri , 

n-^00 U/ 

lim sup — log < R 2 



(7.2.36) 

(7.2.37) 

n— ^00 U- 

and lim Sn = 0. Letting 7 > 0 be an arbitrarily small constant, (7.2.36) and 

n— »oo 

(7.2.37) imply 

- log < i^i + 7 (Vn > no), 
n 

- log 4- 7 (Vn > no), 

n 

By substituting these inequalities into (7.2.30) in Lemma 7.2.2, we have 
^ 1 



> Pr S - log 
n 



> Pr < - log 

\n 

> Pr j - log 



1 


> ill +27I -3e-”^, 


(7.2.38) 


Px^lx^mX^) 


1 


> il2 + 27I - 3e-"'^, 


(7.2.39) 




1 


^ ill 4" R 2 4" 37 ^ ~ 3e , 


(7.2.40) 


Pxr-x?(^r,^2”) 



Application of lim = 0 to (7.2.38) yields 



7.2 The Slepian-Wolf Source Coding System 463 







1 



> + 27 ^ = 0, 



which leads to 



i?i+ 27 >F(Xi|X 2 ). 

Since 7 > 0 is arbitrary, we obtain 

R,>H{X,\X2). 

by letting 7 ^ 0 . Next, from the same argument using (7.2.39), we obtain 

R2>H{X2\Xi). 

Furthermore, we obtain 
Ri + R2> H{XiX2) 

from the same argument using (7.2.40). □ 

Example 7.2.1. If (Xi, X 2 ) = {(X]^, X^)}^^ is stationary and ergodic, we 



These inequalities enable us to express the achievable rate region i7/(Xi, X 2 ) 
by using the ordinary (conditional) entropy-rates iJ(Xi|X 2 ), iJ(X 2 |Xi) and 
H{XiX 2 ) (cf. Cover [16]). In particular, if (Xi, X 2 ) is the memoryless source 
pair subject to a joint probability distribution PxiX 2 i (7.2.41)-(7.2.43) can 
be written in terms of the (conditional) entropies as follows: 



(cf. Slepian and Wolf [83]). In addition, for the case that Xi and A 2 are finite 
source alphabets, we have (7.2.41)-(7.2.43) provided that all of (Xi,X 2 ), 
Xi and X 2 satisfy the strong converse property. That is, we have (7.2.41)- 
(7.2.43) even if (Xi,X 2 ) is not stationary and ergodic. This is due to (the 



have 



H{Xi\X 2 )=H{Xi\X 2 )= lim -H{X^\X^), 

n^oo n 

H{X 2 \Xi) = H(X 2 \Xi) = lim -H{X^\X^), 

n— ^00 77, 

F(XiX 2 ) = ^(XiXa) = lim -H{X^X^). 

n— >00 tL 



(7.2.41) 



(7.2.42) 



(7.2.43) 



H{Xi\X2) = H{Xi\X2), 
F(X2|Xi) = H{X2\Xi), 
H{XiX2) = H{XiX2) 



(7.2.44) 

(7.2.45) 

(7.2.46) 



conditional version of) Corollary 1.7.1 in Chapter 1 . 



□ 



464 7 Multi- Terminal Information Theory 

7.3 Slepian-Wolf Source Coding for Mixed Sources 

Let 

and 

(xf\xf ) = ^ 

be arbitrary two pairs of correlated sources, where A'l and X 2 are countably 
infinite source alphabets. We define the mixed source pair 

(Xi,X2) = {(xr,X2”)Ci 

of two general source pairs by 

^Xj^X^(Xl,X2) == Q^lPj5^(l)nj5^(l)rx(Xl,X2) + a2P^p)r.^(2)n.(Xl,X2), 

where a\ and Q 2 are arbitrary constants satisfying cei > 0, Q 2 > 0 and 
-f-o ;2 = 1. Then, we have the following lemma on the spectral (conditional) 
sup-entropy rates, which is an extended version of Lemma 1.4.1 in Chapter 1. 

Lemma 7.3.1. 



F(Xi|X 2 ) = max , 


(7.3.1) 


H{X 2 \Xi) = max (H{xi^^\x[^^),H{X^^^\xf^)) , 


(7.3.2) 


F(XiX 2 ) = max (f(X^x(')), F(X®xf )) . 


(7.3.3) 



Proof. We have only to write logPxi"|Xy ) and \ogPx^\x'^{X^\X'f) 
as 

logPx.-^ixjWI^F) = logPxrxj(XF,XJ) -logPxH^F), 
logPxj|XI‘(^2”l^r) = logPxrXj(^r>^2") -logPxrW), 

respectively, and apply the same argument in the proof of Lemma 1.4.1 to 
each term on the right-hand sides. □ 

The combination of Theorem 7.2.1 and Lemma 7.3.1 immediately yields 
the following source coding theorem for the mixed source pair. 

Theorem 7.3.1. The achievable rate region of the mixed source pair (Xi, X 2 ) 
is given by 

P/(Xi,X2) = Te^'^^cx^x^'^xi'^xf ), (7.3.4) 

where Xj^^X^^^) on the right-hand side denotes the collection 

of all (i7i,i72) satisfying 



7.3 Slepian-Wolf Source Coding for Mixed Sources 465 
Ri > max )) , 

i^2 > max (F(X«|X^),S^(xf |xf>)) , 

Ri + R 2 > max (F(X^^^X^^^),:ff(xf>X^^^)) . 

The mixed source pair considered above can be generalized in the follow- 
ing way. Suppose that infinitely many pairs of general sources 

(xi'\xW) = ^ (i = 1,2,...) 

are given. We call the source pair (Xi,X 2 ) = )}^i defined by 

oo 

^’x"X"(xi,X2) = y^aiPy(i)„^(i)„(Xi,X2) 
i=l ^ ^ 

(Vn = 1,2, • • • ;V(xi,X 2 ) € x X^) (7.3.5) 

the mixed source pair of a source family {(X^^\ where ai (i = 

1, 2, • ■ •) are constants satisfying 

oo 

Y,ai = l (ai>0: Vi = l,2,---). 

i=l 

We have the following lemma, which is a generalized version of Lemma 7.3.1. 

Lemma 7.3.2. For the mixed source pair (Xi, X 2 ) defined by (7.3.5) it holds 
that 

F(Xi|X 2 ) - sup F(xf^|X<*>), (7.3.6) 

i:o:i>0 

F(X 2 |Xi) = sup F(X^'>|xf>), (7.3.7) 

i:o:i>0 

F(XiX 2 ) = sup (7.3.8) 

i:ai>0 

Proof. This lemma follows from computation of the information-spectra sim- 
ilarly to the proofs of Lemma 1.4.3 in §1.4 and Lemma 3.3.2 in §3.3. □ 

The combination of Theorem 7.2.1 and Lemma 7.3.2 immediately yields 
the following theorem. 

Theorem 7.3.2. The achievable rate region i^/(Xi,X 2 ) of the mixed source 
pair defined by (7.3.5) is given by 

J?/(Xi,X2) -7^({(ai;Xl*^x('>)}“ J , (7.3.9) 

where 77. ^{(aj;Xj*\X 2 *^)}“]^^ represents the collection of all (i?i,i? 2 ) sat- 
isfying the inequalities 



466 7 Multi- Terminal Information Theory 



(7.3.10) 

(7.3.11) 

(7.3.12) 



Ri > sup i?(xf^lX^’^), 

i:oci>0 

i ^2 > sup F(X^'^lxf^), 

i:ai>0 

Ri + R 2 > sup F(X^‘^xf ). 

z:ai>0 

□ 



If we consider a special case that (i = 1, 2, • • •) are stationary 

and ergodic, Theorem 7.3.2 leads to the following corollary. 



Corollary 7.3.1. Suppose that A'l and ^^2 countably infinite source al- 
phabets. ) (z = 1, 2, • • •) are stationary and ergodic, then for the 

mixed source pair (Xi,X 2 ) defined by (7.3.5) we have 

i?y(Xi,X2) =7e({(ai;Xf\x^'>)}“ J , (7.3.13) 

where R ^{(aj;Xj*\X 2 ^)}^ij is the collection of all (i?i,i? 2 ) satisfying the 
inequalities 

Ri > sup F(X^*'|X^*^), (7.3.14) 

i:ai>Q 

R 2 > sup F(X^*’|X^*^), (7.3.15) 

v.a.i>Q 

i?i-hi ?2 > sup i^(xf^x^*') (7.3.16) 

i\Oii>Q 

and F(X^;^|X^‘^),F(X^*^lxf^) and ) are the (conditional) en- 

tropy rates defined by 

F(xf^|X^*') = lim 

n-^oo n 

= lim -if(X;^‘^"|xf^”), 

n—^oo ri 



Now, let us consider the compound source pair deeply related to the mixed 
source pair. First, suppose that infinitely many pairs of general sources 

(x''\x^'^) = J 

are given. For the sets of integers 

MW = {i,2,---,M«}, 



(i = l,2,...) 



7.3 Slepian-Wolf Source Coding for Mixed Sources 467 

we define a triplet of two encoders (pn^ : A'f — > Mn^ and (pn^^ : ^2 
and one decoder 'ijjn : Mn^ x x X 2 (assume that the triplet does 

not depend on each individual source pair for i — 1, 2, ■ • •). If we 

apply the triplet to each source pair (X^^^X^^^), the error probability Sn^ is 
given by 

£« = Pr ^ . (7.3.17) 

We consider the situation that one of the source pairs (Xj^\X 2 ^) (i = 
surely generates an output but the triplet does not know which 
one of the source pairs actually generates an output. Such a source is called 
the compound source pair 

(Xi,X2) = {(xl‘\x«)}“^. 

In encoding of the compound source pair, we would like to keep the error 
probability small for any source that actually generates an output. Denoting 
by {n, Mn\ Mn\ triplets with code sizes Mn^ and m!^ 

and the error probability £n^ (i = 1, 2, • • ■), we give the following definitions. 

Definition 7.3.1. 

Rate pair (^ 1 ,^ 2 ) is achievable 

4?^ There exists an (n, Mn\Mn \ (£n^)^i)-code satisfying 
lim £n^ = 0 (Vi = 1, 2, • • •), limsup — logM,^^^ < R\ and 

n^oo ^^00 n 

limsup — logM^^^ < R 2 . 

n—^cxi 

Definition 7.3.2 (Achievable rate region for the compound source 
pair) . 

R/(Xi,X 2 ) = {{Ri,R 2 ) I (Ri,R 2 ) is achievable}. 



Then, we have the following theorem describing a relationship between the 
achievable rate regions for the mixed source pair and the compound source 
pair. This theorem corresponds to Theorem 3.3.5 in Chapter 3 treating chan- 
nel coding. 

Theorem 7.3.3. Suppose that Xi and X 2 are countably infinite source al- 
phabets. Consider the mixed source pair (Xi, X 2 ) of countably infinite source 
pairs 



468 7 Multi- Terminal Information Theory 



(X« , xf ") } (i = 1, 2, ■ • ■) 

and the compound source pair (Xi,X 2 ) = same 

(X^^\X 2 ^^) (i = 1,2, • • •). Then, the achievable rate region 

Rf{Xi,X2) = Rf ({(ai;Xf\x<‘»)}“ J 

of the mixed source pair (Xi,X 2 ) is equal to the achievable rate region 

i?/(Xi,X2) = i?/({(xf\x«)}~J 

of the compound source pair (Xi,X 2 ) = holds 

that 

Rf ({(a,;X«,x(*»)}~ J = Rf ({(X«,X«)}^J , (7.3.18) 

where we assume > 0 for all i = 1,2, - 

Proof Let Sn^ {i = 1, 2, • • •) and Sn be the error probabilities for respective 
source pairs (X^^\X 2 ^) (i = 1,2, •■•) and the mixed source pair (Xi,X 2 ) 
under the same pair of encoders {(pn\Tn'^) a decoder Then, (7.3.5) 
implies that 

oo 

e — 

Cn — / ^ • 

i=l 

The claim of this theorem follows similarly to the proof of Theorem 3.3.5 in 
Chapter 3. □ 

Now, let us return to the mixed source pair. We consider here the mixed 
source with a more general way of mixing (see §1.4 in Chapter 1). Let ^ be 
an arbitrary set (probability space). We assign a general source pair 

(xi"\xf ) = ”)}“ ^ 

to each 9 e Here, letting the source alphabets be A:'i and ^ 2 , for all 
n = 1, 2, ■ ■ • and for all (xi, X 2 ) G Tf x the probability Py(e)n. ^{e)n (xi, X 2 ) 

Ai A2 

with respect to is assumed to be a measurable function of 9. 

By determining an arbitrary probability measure w on we have the source 
(Xi,X 2 ) = {{^ 11 ^ 2 )}^=! subject to the probability distribution 

Px^x^i'^1,^2) = j P^(e)„^(8)„(xi,X2)rfu;(6') 

(Vn = 1,2 ,---;V(xi,X 2) € x X^). (7.3.19) 

We call (Xi,X 2 ) the mixed source pair of a source family 



7.3 Slepian-Wolf Source Coding for Mixed Sources 



469 






We now define functions of R directly dependent on the information-spectra 
as follows: 



F{R\Xi\X 2 ) = liminf Pr - log 






F(i^|Xi|X 2 ) =limsupPr< - log 






£(P|X 2 |Xi) = liminf Pr - log 



oo" \n "Pxn|xj(Xf|X 2 ") 



>R 



>R 



P(P|X 2 |Xi) = limsupPr { - log 



~ [n "Pxj|xr(^ 2 ”l^r) 



oo" [n "Pxj|xr(^ 2 ”l^r) 



>R 



>R 



P(P|XiX 2 ) = liminf Pr log ■ 



> R 



\ n Pxj-Xj (Xf , X 2 -) 

P(P|XiX 2 ) = lim sup Pr I i log > r] , 

n—>oo 1. n. Pxj*xj (Xj , A 2 j J 



(7.3.20) 

(7.3.21) 

(7.3.22) 

(7.3.23) 

(7.3.24) 

(7.3.25) 



and try to express these functions in terms of w(-). Since this problem is 
difficult if (X[^\X 2 ^^) (0 e is general, we consider the case that Xi and 
X 2 are finite source alphabets and (X^i\x^^) {6 € 0) is stationary and 
memoryless subject to P„(e)„(e) (0 G (we simply write (X^^^X^^^) as 

PL 1 A2 

X; 2 ^^)}). Then, we have the following lemma, which corresponds to 
Lemma 1.4.4 in §1.4 of Chapter 1 and Lemma 3.3.3 in §3.3 of Chapter 3. 



Lemma 7.3.3. Suppose that and are finite source alphabets. If each 
X 2 ^^)} is stationary and memoryless, then for the 
mixed source pair (Xi,X 2 ) = {{^ 11 ^ 2 )}^=! defined by (7.3.19) it holds 
that 



L 






dw{6) < P(P|Xi|X 2 ) 



<P(P|Xi|X2) 

< / dw(e) 



(VP > 0), 

(7.3.26) 



470 7 Multi- Terminal Information Theory 



L 






dw{6) <£(i?|X 2 |Xi) 



< F{R\X2\Xi) 

< f dw(9) 



(Vi? > 0), 

(7.3.27) 



/ dw{e)<F{R\XrX2) 

J ^9\H{Xl ’ X^ ’)>Rj 

< F{R\XiX2) 

- /f (.) (.) 

(7.3.28) 

w/iereii(xf^|Xf^),ii(xf^|X^®') andii(xf ^) denote the (conditional) 
entropies and the inequalities in (7.3.26)-(7.3,28) hold with equality except 
for at most countably infinite R. 

Proof We have only to consider the joint type Tx^,x 2 of Xi G Aff and X 2 G A'.f 
(see the proof of Lemma 3.3.3 in §3.3) and compute the information-spectra 
similarly to the proofs of Lemma 1.4.4 in §1.4 and Lemma 3. 3. 3 in §3.3. □ 



Remark 7.3.1. Lemma 7.3.3 means that the (conditional) entropy-density 
rates 



-log 

n 




1 

Px-\x-{x^\x^y 

1 

Pxr^\x^\xif\x^y 




1 



of the mixed source pair (Xi,X 2 ) = defined in the lemma 

are distributed according to the weight density w{6) in the limit of n oo 
if we choose the (conditional) entropies 

^■(xf^ixf^), F(xf^xf^) 

as the horizontal axes. In this case, the (conditional) entropy-rates of (Xi, X 2 ) 
can be computed as 



7.4 £-Source Coding for Slepian-Wolf Source Coding System 471 



F(Xi|X 2 )= lim lifCXfIXJ) = [ H{x[^^\X^^^)dw{e), (7.3.29) 

F(X 2 |Xi) = lim -H{X^\X^) = f H{X^^^\x[^^)dw{e), (7.3.30) 

71 ^oo 77/ / 

F(XiX 2 ) = lim -H{X^X^) = [ (7.3.31) 

71— >oo 77 , J 

The following theorem is immediately obtained from Theorem 7.2.1 and 
Lemma 7.3.3. 

Theorem 7.3.4. For the mixed source pair (Xi, X 2 ) defined in Lemma 7.3.3 
the achievable rate region i^/(Xi,X 2 ) is given by 

il/(Xi, X 2 ) = 7^ (^{(w;(0); xf )}^^^^ , (7.3.32) 

where TZ ^{(w(0); X 2 ^^)} 0 g$j denotes the collection of all (Ri,R 2 ) sat- 

isfying 

Ri > u;-ess.supi?(X{®'|Xf^), (7.3.33) 

i ?2 > w-ess.supif(xf^|X^®'), (7.3.34) 

ill + i ?2 > w-ess.sup H{x[^'’X^ 2 ^). (7.3.35) 

Proof. We can prove this theorem similarly to the proof of Theorem 1.4.3 in 

§1.4 by using Theorem 7.2.1 and Lemma 7.3.3. □ 



7.4 e-Source Coding for Slepian-Wolf Source Coding 
System 

In the source coding treated in the preceding sections the error probability 
is required to satisfy 

= Pr {(xr, X 2 ”) ^0 as n 00 . 

In this section we consider the Slepian-Wolf source coding that is required to 
satisfy only 

lim sup Sn < s 

n-^00 

for an arbitrarily fixed constant 0 < 5 < 1. Since the requirement on the 
error probability is weakened, we can expect that smaller rate pairs become 
achievable. We begin with giving definitions. 



472 7 Multi- Terminal Information Theory 



Definition 7.4.1. 

Rate pair (^1,^2) is e-achievable 

There exists an Mn\ Mn\ ^n)-^ode satisfying limsupe^ < 

n —^00 

limsup - logM^^^ < Ri and limsup — logM^^^ < R2. 

n^oo n— >00 ^ 

Definition 7.4.2 (e- Achievable rate region). 

R/fy|Xi,X 2 ) = {(^ 1 ,^ 2 ) I (^ 1 ,^ 2 ) is e-achievahle} . ('^•4-1) 



We call this two-dimensional region R/(e|Xi,X 2 ) the e-achievahle rate 
region. The objective of this section is determination of this region. To this 
end, we first set 



Fn{Ri,R2) = Pr< - log 



n ^ Px-\x^{X'l\X^) 



> R\ 



n Px^ix^iX^m - 



Px^x^{Xf,X^) 



> Ri + R 2 



and define F(Ri, R 2 ), a function with two variables, by 

F{Ri,R 2 ) = limsup F^(Ri,R 2 )- 



(7.4.2) 

(7.4.3) 



Then, we have the following theorem, which is a generalized version of The- 
orem 1.6.1 in Chapter 1. 



Theorem 7.4.1. 

Rf{e\X,,X2)=Cl{{{RuR2)\F{Ri,R2)<£}) (0 < Vs < 1), (7.4.4) 

where C\(-) on the right-hand side denotes the closure operation. □ 



Remark 7.4.1. The right-hand side of (7.4.4) is not always a convex set in 
general. □ 



Proof of Theorem l.f.l. 

1) Direct part: 

For an arbitrary (^ 1 ,^ 2 ) satisfying 

(Ri,R 2) e Cl({(Ri,R2)|i"(i^i,i^2) <4), 



(7.4.5) 



7.4 ^-Source Coding for Slepian-Wolf Source Coding System 473 



we define 

M^i) = 

= e"(^2+27)^ 

where 7 > 0 is an arbitrary small constant. Then, Lemma 7.2.1 implies that 
there exists an (n, e„)-code satisfying 



£« < Pr< - log 



1 



n "PxrixyWIX?) 



> i?l + 7 



or — log 



n ■Px"|Xj"'(^2 1'^") 



> i?2 + 7 



or — log 



n ^ Px^x-{X^,X^) 



< Pr< — log 



1 



or — log 



n ^ Px-\x-{XnX^) 

1 



^ R\ 4* R 2 4“ 37 / 4" 3e 



> i^l 4- 7 



-717 



n "Px?|xr(^2"l^r) 



or — log 



n ^Px^xi^{X^,X^} 



> i?2 + 7 



> ill + i?2 + 27 ^ 



By taking limsup of both sides of (7.4.6), it follows that 

71— >00 

lim sup £n < F{Ri H- 7, i^2 + 7) < 



(7.4.6) 



(7.4.7) 



Since 7 > 0 is arbitrarily small, (7.4.7) means that every rate pair (Ri,R 2 ) 
with (7.4.5) is e- achievable. 

2) Converse part: 

If (i7i,i^2) is 5- achievable, there exists an {n^ Mn\ Mn\sn)-code satis- 
fying 



lim sup — log < Ri , 

71^00 n 

lim sup — log < R 2 



and 

lim sup ^ 

71— >00 

Then, Lemma 7.2.2 tells us that 



(7.4.8) 

(7.4.9) 

(7.4.10) 



474 7 Multi- Terminal Information Theory 






> I log + 7 



^ P r ynlyn^ ^ ^ logM^2) ^ ^ 



or i log 



1 






> I log(MWM^2)) + ^1 _ 3e-n7 



(7.4.11) 

for an arbitrarily small 7 > 0. On the other hand, since (7.4.8) and (7.4.9) 
lead to 



^ log + 7 (Vn > no), 

- logM^^^ < i^2 + 7 (Vn > no), 



respectively, by substituting these inequalities into (7.4.11) we obtain 






“ n‘°®Px,.K,-(XJ|X,”) 



1 



or — log 



n ^Px-xri{X^,X^) 



> i?2 + 27 

> i?i + i?2 + 37 I - 3e-"T' 



> Pr< - log 

- \n ^Pxi‘ix^{X^m 



> Ri + 2 j 



1 



or — log 

n ®Pxy|xr(^2”i^r) 



> P2 + 27 



or — log 



1 



n ^Px^Xi^(X^,X^) 

Therefore, in view of (7.4.10) it follows that 
F{Ri + 27, R2 + 27) < limsup^n < 



— > + i^2 + 47 V - 3e“ 



■717 



(7.4.12) 

(7.4.13) 



which means 

(Ri,R 2) e Cl({(Pi,P2) I F{Ri,R2) < e}) 

because 7 > 0 is arbitrary. □ 



7.4 £-Source Coding for Slepian-Wolf Source Coding System 475 



Example 7.4.1. As an application of Theorem 7.4.1, let us consider the case 
that (Xi = {Xf}^i,X 2 = {^ 2 )^ 1 ) is stationary and memoryless subject 
to a joint probability distribution PxiX 2 - case, since Khintchin’s law 

of large numbers (see Theorem 1.3.2 in Chapter 1) implies that 




-log 

n 



1 

Px^\x^{x^\x^y 

1 

Px^\x^{x^\x]^y 




1 

Px^x^{X^,X^) 



converge in probability to the (conditional) entropies 
f7(Xi|X2), H{X2\Xi), H{X,X2), 

respectively, we have 

i7/(e|Xi,X2) = {{Ri,R2)\Ri > H{Xi\X 2),R2 > H{X2\Xi), 

+ i^2 > H{XiX2)} (0 < Vs < 1). 

Hence, i^/(s|Xi,X 2 ) of this case does not depend on 0 < s < 1. □ 



Example 7.4.2. Let us consider another application of Theorem 7.4.1. Let 
—1^1 Jn=l5^2 — 1^2 /n=lb 

i,xf = 

be two pairs of stationary memoryless sources subject to joint probability 
distributions P^( 2 )^( 2 ), respectively, and consider the mixed 

source pair (Xi = {Xf}^i,X 2 == {^"2 }^i) defined by 

-PXj"X^(xi,X2) = aiP^.(i)»_^(l)„(xi,X2) +a2P^(2)„^(2)n(xi,X2) 

for some a\ and 0^2 satisfying ai > 0, Q !2 > 0 and a 1 -1- 0:2 = 1. Then, 
the s-achievable rate region i7/(s|Xi,X2) varies according to values of the 
(conditional) entropies 

H(X<«X<'>), 

i/(xP'[xJ“>), fl(xf’ixf’), fl{xf’xf'). 

That is, by carefully considering the convergence in probability of the entropy- 
spectra as was shown in Remark 1.4.1 in §1.4, we have 



476 7 Multi- Terminal Information Theory 



Rf{e\X,,X2) = {{Ri,R2)\Ri > R 2 > 

Ri + R2>H{x[^^X^^^)} (0<V£<ai), 

ii^(£|Xi,X2) = {(ill,i?2)|i?l > F(XP|Xf ),i^2 > 

Ri + R2> (ai < V£ < 1) 

for the case of 

F(x(')|xf))>if(xf |xP), 

H{x[^^x^^^) > 

and 

%(£|Xi,X 2) = {(i^l,i^2)|i^l > H{x[^^\xP),R2 > F(xf^|Xp)), 

Ri + R 2 > H{x[^'>X^^'>)} (0 < Ve < ai), 

Rf(e\Xi,X2) = {{Ri,R2)\Ri > F(xf|xf ),i^2 > F(xf^|X®), 

iii+i?2>if(XPxf^)} (ai<V£<a2), 

i?/(£|Xi,X2) = {{Ri,R2)\Ri > ),i?2 > /f(xW|xf>), 

/il + i?2 > F(xf ))} (tt2 < V£ < 1) 

for the case of 

£T(X{')|X('^) > //(Xf^ixf ), 

Jf(X^|X{'^)<i?(xf |X®), 

> ff(xPx^) 

and cei < q; 2 . D 

Remark 7.4.2. We can generalize Example 7.4.2 if A'l and are finite 
source alphabets. That is, if we consider the mixed source pair (Xi, X 2 ) of sta- 
tionary memoryless source pairs (X^^\ X^^^) {6 e ^), defined in Lemma 7.3.3, 
(7.4.4) in Theorem 7.4.1 can be expressed in a computable form as follows: 

R/(s|Xi,X2) 

= Cl|^|(i?l,i?2)| 

[ dw{6) < £ > I 

OrH(xf’|X$“')>fi2 Or/f(xf'x<®')>fli + R2} J J 

(0<V£<1) (7.4.14) 



7.5 Strong Converse Theorem for Slepian-Wolf Source Coding System 477 



(generalization of Theorem 7.3.4). This equation can be obtained in the fol- 
lowing way. Defining 



1 



1 



F(i?i,ii 2 |Xi,X 2 ) = liminfPr.^ (yn\Yn\ 

n^oo In Px^\X"{^l\X^) 



> Ri 



“ n Px?|x;‘(^ 2 "l^r) 



or — log 



1 






F(Pi, P 2 |Xi,X 2 ) = limsupPr< - log 



111 



> R2 

> Ri + P2 ^ ) 
> Pi 



(7.4.15) 



00 Pxj“|X^(X"|X2 ) 

> R 2 



” n Pxj|xi*(X2”|Xf) 



or — log 



n ^ Pxi^x^{X^,X^) 



^ R\ T R 2 



(7.4.16) 

instead of F(i7|XiX2) and F(i7|XiX2) in Lemma 7.3.3, it can be verified 
that 



L 



<P(Pl,P2|Xi,X2) 

<P(Pl,P2|Xi,X2) 



dw{6) 



<-L 



{^e\H{x[^^\x^^^)>RiOi h{x^/^\x[^^)>R 2 or h{x[^^x^^^^)>Ri-\-R 2 } 



dw{9) 



(7.4.17) 

for all Ri > 0 and R 2 ^ 0 from careful checking of arguments used in the 
proof of Lemma 7.3.3. □ 



7.5 Strong Converse Theorem for Slepian-Wolf Source 
Coding System 

Theorem 7.2.1 on the Slepian-Wolf source coding consists of direct part and 
converse part. What happens if we require strong converse property on the 
converse part? This section is devoted to investigation of the strong converse 
property. 



478 7 Multi- Terminal Information Theory 



Definition 7.5.1. In the Slepian- Wolf source coding, a source pair (Xi,X 2 ) 
is called to satisfy the strong converse property if any (n, Mn ^ , Mn\sn)-code 
with 

limsup — logM^^^ < ^ 1 , (7.5.1) 

n— >cx) ^ 

limsup — logM^^^ < R 2 (7.5.2) 

n—^00 ^ 

for an arbitrary (i^i,i^ 2 ) ^ -K/(Xi,X 2 ) satisfies 

lim £n = 1. (7.5.3) 

n— >00 



Defining the spectral (conditional) inf-entropy rates by 



^(Xi|X 2 ) == p- lim inf — log 



1 






P(X 2 |Xi) ==p-liminf llog- ]vn\vn\ ’ 

P(XiX 2 ) = p-liminf I log tWtm > 

n-^00 n Pxj"-xy(Xi , A2 ) 

we have the following theorem on the strong converse property. 



(7.5.4) 

(7.5.5) 

(7.5.6) 



Theorem 7.5.1 (Han [36]). Suppose that and A 2 are countably infinite 
source alphabets. A source pair (Xi,X 2 ) satisfies the strong converse prop- 
erty if and only if 



a) for the case o/if(XiX 2 ) > i7(Xi|X2) -|- i/(X 2 |Xi).- 

^(Xi|X2)=^(XilX2), (7.5.7) 

:H(X2|Xi)=F(X2|Xi), (7.5.8) 

S^(XiX2) - ^(XiX2) (7.5.9) 

are satisfied, 

b) for the case ofH{XiX 2 ) < S^(Xi|X 2 ) + S^(X 2 |Xi).* 

(7.5.7) and (7.5.8) are satisfied. 



Proof. 

1) Sufficiency: 

In view of Theorem 7.2.1, either for the cases a) or b), (i7i,i^2) ^ 
Rf(Xi,X 2 ) means that at least one of the inequalities (7.2.10)-(7.2.12) does 
not hold and vice versa. That is, if (Ri,R 2 ) ^ i^/(Xi,X 2 ), at least one of 

Ri <H(Xi\X2), (7.5.10) 

772 <^(X2|Xi), (7.5.11) 

77i + i72 < F(XiX2) (7.5.12) 



7.5 Strong Converse Theorem for Slepian-Wolf Source Coding System 479 



must hold. First, we consider the case that (7.5.10) holds. Setting Ri = 
iJ(Xi|X2) — 37 (7 > 0), we consider an arbitrary {n,Mn\Mn\ £n)-code 
satisfying (7.5.1) for this Ri. Since (7.2.30) in Lemma 7.2.2 tells us that 



€n > Pr 






it follows from (7.5.1) that for all n > no 

> + 27 i - 3e“"^ 



£„ > Pr i log 






>H{Xi\X2)-j}-3e 



— 717 



If (7.5.7) holds, the first term on the right-hand side goes to 1 as n oo 
from the definition of ^(Xi|X2). This means 1 as n oo because 

g-n7 0 as n — ^ oo. 

For the cases that (7.5.11) or (7.5.12) holds, we can similarly prove that 
— > 1 as n — ^ oo by using (7.5.8), (7.5.9) and Lemma 7.2.2. 



2) Necessity: 

We only prove this part for the case a) because we can similarly prove 
for the case b). We choose a pair of rates (i^i,i?2) satisfying one of (7.5.10)- 
(7.5.12) and develop the equalities (7.5.7)-(7.5.9). First, for the case that 
(7.5.10) holds, consider 

i?i-S^(Xi|X2)-7, 

where 7 > 0 is an arbitrarily small constant. Next, choose R 2 satisfying 

i^2 >F(X2 |Xi) + 27, 

Ri-\-R2> ^(XiX2) T 27 

arbitrary and define Mn^ = e^^'^ and Mn^ = Then, Lemma 7.2.1 

guarantees the existence of an {n^Mn\Mn\sn)-^ode satisfying 



en < Pr< - log 



1 



1 



or — log 



n ^ Px-\x^{X-\X-) 
1 



>Ri~l 






> R 2 - 7 



or — log 



n ^Px^x^iX^.X^) 



^ Ri ~h R 2 — 7 f T 3e 



— 717 



< Pr < — log 









480 7 Multi- Terminal Information Theory 






{-1 

In 



+ Pr < - log 



1 



Px-x-(X^,X^ 

1 



> J?2 - 7 1 

> -Ri + i?2 - 7 r’ + 3e“"^ 



+ ^4n^°®Pxj|xr(X2”I^D 



|- 

[n 



+ Pr <^ - log 



1 



Px^x^{X^,X^) 



>R(Xi|X2)-27| 

>R(X2|Xi)+7j 
R(XiX2)+7| 



> 



+ 3e-^^. 



(7.5.13) 

Notice here that the second and the third terms on the right-hand side of 
(7.5.13), as well as the fourth term, converge to 0 as n ^ oo due to the 
definitions of iJ(X2|Xi) and ii7(XiX2). Hence, we have 



lim inf Sn < lim inf Pr < — log 



1 



PX»|X?WI^2) 



> P ( Xi | X2 )- 27y 



(7.5.14) 



On the other hand, lim inf 6:^ = 1 must be satisfied from the assumption of 

n—^oo 

the strong converse property. Thus, we obtain 



lim Pr 

n-^oo 



n PxiMxj (Xf 1X2”) 



>P(Xi|X 2)-27 



from (7.5.14), which means 



/7(Xi|X2)-27<^(Xi|X2). 

By letting 7 — > 0, we obtain i/(Xi|X 2 ) < ^(Xi|X 2 ). Since iJ(Xi|X 2 ) > 
S(Xi|X 2 ) alwaysjiolds, F(Xi|X 2 ) - S(Xi|X 2 ) follows. 

We can prove ii 7 (X 2 |Xi) — ^(X 2 |Xi) similarly. While this completes the 
proof of the theorem for the case b), we must develop iJ(XiX 2 ) = ^(XiX 2 ) 
for the case a). Notice that, for the case a), we can choose (i^i, i^ 2 ) satisfying 

i7i >^(Xi|X 2 ) + 27, 

772 >^(X2lXi) + 27, 

77i+i72=F(XiX2)-7, 



where 7 > 0 is an arbitrarily small constant. Defining and 

Mn^ = we can similarly obtain i 7 '(XiX 2 ) = ^(XiX 2 ) by using 

Lemma 7.2.1. □ 



7.6 Multiple- Access Channel Coding Systems 481 



Remark 7.5.1. If (Xi,X 2 ) is stationary and memoryless or stationary and 
ergodic, the conditions (7.5.7)-(7.5.9) in Theorem 7.5.1 a) are satisfied. Thus, 
such sources pairs satisfy the strong converse property. However, the mixed 
source pairs do not satisfy the strong converse property because (7.5.7)- 
(7.5.9) are not satisfied in general. In addition, if (Xi, X 2 ) satisfies the strong 
converse property, we have 

Rf{e\XuX2) - R/(Xi,X2) (0 < Vs < 1). (7.5.15) 

Therefore, the result in Example 7.4.1 follows from the strong converse prop- 
erty. Nevertheless, (7.5.15) does not always imply the strong converse prop- 
erty. □ 



7.6 Multiple- Access Channel Coding Systems 

Hereafter, we consider coding problems on a general multiple-access channel. 
The multiple-access channel is a channel with two input terminals and one 
output terminal. More precisely, letting A'l and A 2 arbitrary input alphabets 
and y an arbitrary output alphabet ( A'l , A 2 and y can be countably infinite 
or continuous sets), we call the sequence W = {H7’^(-|-, conditional 

probability distributions : A'f x a general multiple- access 

channel^ where the conditional probability distribution is arbitrary as 
far as it satisfies 

Y, W^"(y|xi,X2) = l (Vn = l,2,---;Vxie-Yi",Vx2€A'2”). 

Notice that the multiple- access channel W defined in this way is quite gen- 
eral; W can be nonstationary or nonergodic and have any kind of memory 
structure. 

Information is transmitted through this multiple-access channel in the 
following way. For two message sets 

we call two mappings ^ : Mn^ — > Af and : Mn^ — > en- 

coder 1 and encoder 2, respectively. Here, = ^n\i) and Vj = (fn\j) 
are called codewords for messages i G Mn^ and j G Mn \ respectively, and 
= |ui, U 2 , • • • , u^a) I and == |vi, V 2 , • • • , v^^ 2 ) | codes of the en- 
coders 1 and 2, respectively. When the encoder ^ transmits a message 



482 7 Multi- Terminal Information Theory 

i G Mn \ it inputs the codewords to one of the input terminals (ter- 
minal 1). Simultaneously, the encoder (pn"^ inputs the codeword Vj corre- 
sponding to a message j G to the other terminal (terminal 2) when it 

transmits the message j {simultaneous coding). Note that the encoder ipn^ 
cannot see the message j G and the encoder cannot see the message 
i G A4n ^ {separate coding). 

On the other hand, letting y be an output from the multiple- access chan- 
nel, a decoder judges that (i, j) G Mn^ x is transmitted if y G T>ij, 

where 

3^" = U U (*>3) 7^ ii'J')) 

i=l j = l 

is a disjoint partition of determined in advance. The region Vij is called 
the decoding region for a message pair {i^j). This operation is called decoding. 
The mapping ^ A^n ^ x Mn^ specifying this operation is called a 

decoder (Fig. 7.3). 




rhl,rh2 



encoder 2 



Fig. 7.3. 



Next, we call 

^ ~ 'og = “ log ^n'' 

the coding rates of the encoders 1 and 2. We define the error probability 












i=l j=l 



(7.6.1) 



where the superscript “c” denotes the complement. Notice that £n is defined 
as the average of the error probability under the assumption that messages 
i G M-n^ and j G M.n^ are generated subject to the uniform distributions 
over Adn ^ and Mn \ respectively. Hereafter, we call the code 



7.7 General Capacity Region Theorem for Multiple- Access Channels 483 



( 4 ^^ = {ui, U2, • • • , } , 42) = |vi, V2, • • • , }) 

with the sets of messages Mn^ and Mn^ of sizes and respectively, 

and the error probability Sn an {n, Mn\ Mn\sn)-code. 

Coding problems on the multiple- access channel are formulated as max- 
imization of the rate pair subject to the requirement that there 

exists a triplet with the error probability less than a certain 

given value. We note here, however, that, as far as we try to make the error 
probability less than the given value, there is a “trade-off” between Vn^ and 
rn^ such that we need to make one of Vn^ and Vn^ small if we make the 
other large. Therefore, in coding problems of the multiple-access channel it is 
fundamental to determine a two-dimensional region of all realizable rate pairs 
{rn\Tn^). The coding problems of the multiple-access channel are different 
from the single-user channel coding problems defined in Chapter 3 at this 
point. In this section we first consider the case that the error probability €n 
is required to satisfy 0 as n 00 . We give the following definitions: 

Definition 7.6.1. 

Rate pair is achievable 

<4^ There exists an (n, Mn\ 6n)-code satisfying lim = 0, 

n -^00 

lim inf — > Ri and lim inf — > R 2 . 

n— >-oo n n — >00 n 

Definition 7.6.2 (Capacity region). 

C(W) = \ Ri ^ 0, R 2 > 0 and (i^i,i? 2 ) is achievable} . 



In the following section we give a general formula for determining the capacity 
region C(W) of a multiple-access channel W under these definitions. It is 
obvious from the definitions above that the capacity region C(W) forms a 
“closed region” in the two-dimensional plane. 



7.7 General Capacity Region Theorem for 
Multiple- Access Channels 

In this section we describe a general formula (Theorem 7.7.1) for the capacity 
region C(W) of a multiple-access channel W and give an example as an 
application of the general formula. The proof of this general formula will be 
given in §7.10. 

First, we consider two general sources (input processes) 



484 7 Multi- Terminal Information Theory 



= 

as were given in Chapter 1 , where for each 1 < i < n and are 

random variables taking values in input alphabets Xi and X2 , respectively. If 

== -Px«(xi)Px^(x2) 

is satisfied for all n = 1 , 2 , . . xi G Aff and X2 G the two input processes 
Xi = Sind X2 {^2)^1 called independent and expressed as 

Xi±X2. Hereafter, the collection of all (Xi, X2) satisfying Xi_LX2 is denoted 
by Si. 

For a given pair of input processes (Xi,X2) G Sj we define the channel 
output 

by 

^Xi"X2"y"(xi,x2,y) = -PA:i"(xi)Pxj(x2)T^"(y|xi,X2) 

for all n = 1 , 2 , . . xi € Y", X2 € and y € 3 ^”, where is a random 
variable taking values in an output alphabet y. 

In coding problems of the multiple-access channel, the following three 
limit inferiors in probability (see Chapter 1 ) play an important role. We 
define 



I Xi; Y X 2 ) = p-hminf log , 

n^oo n PYr.\x^{Y'^\X^) 


(7.7.1) 


, ... .1, W^{Y^\X^,X^) 

/(X 2 ;YXi) = p-hmmf log tvn\vn\^ 

n^oo n Pyn|x^(Y«|Xf) 


(7.7.2) 


^ ,. .rl. 

/(XiX 2;Y) =p-limmf log 

n-^oQ n FY^[y ) 


(7.7.3) 


and call them the spectral (conditional) inf-mutual information rates of a 
channel W. By defining 7 ^w(Xi,X 2 ) as the collection of all (Ri^R 2 ) satis- 
fying 


0<Pi </(Xi;Y|X2), 


(7.7.4) 


0<P2<I(X2;Y|Xi), 


(7.7.5) 


P 1 + P 2 <I(XiX2;Y), 


(7.7.6) 



we have the following quite general theorem: 



7.7 General Capacity Region Theorem for Multiple- Access Channels 485 



Theorem 7.7.1 (Han [36]). For arbitrary input alphabets and X 2 and an 
output alphabet y (Xi,X 2 and y can be arbitrary (not restricted to finite) 
sets), the capacity region C(W) of a multiple- access channel W is given by 

C(W)= U 7ew(Xi,X2). (7.7.7) 

(Xi,X2)G5j 

The proof of this theorem is postponed until § 7.10. In fact, basic idea 
of the proof consists in a natural extension of the proof of Theorem 3.2.1 
given in Chapter 3. We conclude this section by giving a few remarks and an 
example on Theorem 7.7.1. 

Remark 7.7.1. The set on the right-hand side of (7.7.7) is closed (see the 
argument in Remark 5.7.5 in Chapter 5). □ 

Remark 7.7.2. As is easily seen, for a general multiple- access channel W 
the set on the right-hand side of (7.7.7) is not always convex, though the 
set is always convex when we consider stationary memoryless multiple- access 
channels (cf. Cover and Thomas [17]). This means that the timesharing 
principle, which holds for stationary memoryless channels, does not always 
hold for general multiple access channels. □ 

Remark 7.7.3. For a general multiple-access channel W one of the inequal- 
ities (7.7.4)-(7.7.6) can be redundant, which is different from the case of 
stationary memoryless channels (e.g.. Cover and Thomas [17]). □ 

Example 7.7.1. Let us apply the general formula (Theorem 7.7.1) to the 
multiple-access channel W with “memory” given below. In this example the 
information-spectrum approach is quite useful. First, let the input alphabets 
and the output alphabet he Xi = X 2 = y = {0,1}. Let 

be an arbitrary nonstationary and nonergodic noise process, where is a 
random variable taking values in Z = {0, 1}. Next, define the output 

Y = {y” = • • • , ^ 



from a multiple-access channel W corresponding to an input pair (Xi,X 2 ) 
G Si by 



^ i 


© 

© 




■,n), 


(7.7.8) 


where 












r Y'^ ( 

I^Ai — , Aj2 5 • * 


:xS)r,. 






X2 = j 


r Y''^ f yO^^ yO^^ 

|^A2 — [y^21 5^22 5 ' ' 









486 7 Multi- Terminal Information Theory 



and denotes a random variable independent of and {additive 
channel). Then, the capacity region C(W) of this additive multiple-access 
channel W is given by 

C(W) = {{Ri,R2)\Ri>0,R2>0,Ri + R2<log2-H{Z)}, (7.7.9) 

where 

H{Z) = p- lim sup - log ] . . 

Hereafter, we establish (7.7.9). We use the following inequalities: 



J(XiX2; Y) < H{Y) - H{Y\y.iK2), (7.7.10) 

7(Xi; YIX 2 ) > H{Y\X2) -£(Y|XiX2), (7.7.11) 

KX 2 ; Y|Xi) > F(Y|Xi^- F(Y|XiX2), (7.7.12) 

/(X 1 X 2 ; Y) > H{Y) - H^(Y|XiX 2 ), (7.7.13) 



which can be easily obtained from the definitions of the limit superior and 
the limit inferior in probability. Here, for simplicity, we set 

H(Y1X.X,) . p- llmsup i log 

fflYIXj) - p- 11m inti log j , 

n^oo n Fyn|xj(Y"|AJ) 

F(Y|Xi) = p- lim inf I log— TTTiTnFM’ 

n-*oo n PyrL|xj‘(Y"|Af) 

H.{Y) = p- lim inf — log ■ 



■^00 n Pyri{Y'^) 



First, we notice that 

S^(Y|XiX2) - S^(Z) (7.7.14) 

since the noise process Z is additive. By using this, (7.7.10) and the inequality 
H{Y) < log |3^| = log 2 (see Theorem 1.7.2), it holds that 

J(XiX2;Y)<log2-S^(Z) 



for all (Xi,X 2 ) G Sj. Hence, in view of Theorem 7.7.1 we obtain 

C(W) c {{Ri,R 2 )\Ri >0,R2>0,Ri + R 2 <log 2 -H{Z)} . (7.7.15) 

Next, in order to establish the relationship opposite to (7.7.15), let us con- 
sider the pair of stationary memoryless inputs (Xi,X 2 ) G Sr subject to the 
uniform distributions over and Y 2 , respectively. Since it holds that 



S(Y1X2) - S(Y|Xi) = H{Y) = log 2 



7.8 Stationary Memoryless Multiple- Access Channels 487 



for this pair of the input processes, due to (7.7.11)-(7.7.14) we have 

7(Xi;Y|X 2) >log2-£(Z), 

7(X2;Y|Xi) >log2-H(Z), 

7(XiX 2;Y) >log2-ff(Z). 

Therefore, in view of Theorem 7.7.1 we obtain 

C(W) D {(7ii,i?2)|i?i >0,7?2 >0,7^1+772 < log 2- ff(Z)}. (7.7.16) 

From the combination of (7.7.15) and (7.7.16) we finally obtain the formula 
(7.7.9) on the capacity region for the additive multiple- access channel W. 
Notice that, if Z is the stationary memoryless process subject to the proba- 
bility distribution with Pz(0) = p in particular, we have H{7i) = h{p)^ where 
h{p) denotes the binary entropy. Though the formula (7.7.9) can be immedi- 
ately obtained by using standard methods established in previous studies (for 
example, the joint typical sequences and the Fano inequality) and without 
using Theorem 7.7.1, we cannot obtain the formula (7.7.9) from such stan- 
dard methods for the cases that Z is a stationary irreducible Markov source, 

a stationary ergodic source (H(Z) = lim for these cases), a mixed 

n^oo n 

source (see §1.4) or a stationary source. □ 



7.8 Stationary Memoryless Multiple- Access Channels 

In this section let us apply the general formula (7.7.7) given in Theorem 7.7.1 
to stationary memoryless channels often studied in previous studies. The 
objective of this section is verification by direct computation that the formula 
(7.7.7) actually coincides with the well-known formula on the capacity region 
(e.g., cf. Cover and Thomas [17]). This verification is not trivial because 
the formula (7.7.7) is expressed in terms of the spectral (conditional) inf- 
mutual information rate completely different from the (conditional) mutual 
information. 

Before starting computation for the verification, let us give the formula 
in a “computable” form of the capacity region for a stationary memoryless 
multiple-access channel W, which becomes the goal of the computation. First, 
consider an arbitrary random variable Q taking values in an arbitrary finite 
set Q. Denote by Jm the set of all triplets of random variables (Xi, X 2 ,Q) 
with the property that Xi and X 2 are conditionally independent given Q, 
where Xi and X 2 are random variables taking values in input alphabets A'l 
and T 2 , respectively. Next, for each (Xi, X 2 , Q) G Jm define the correspond- 
ing channel output Y taking values in an output alphabet y by 

PQXiX2Y{q,Xi,X2,y) = PQiq)PXi\Q{xi\q)Px2\Q{^2\q)W{y\xi,X2), 

(7.8.1) 



488 7 Multi- Terminal Information Theory 



where W : x ^2 ^ y denotes the probability transition matrix of a 

stationary memoryless multiple- access channel W. Here, the channel W == 
-)}^i is called memoryless if the transition probability •) is 

written as 

n 

PF"(y|xi,X2) = Y[Wi{yi\xii,X2i), (7.8.2) 

i=l 

where Xi = X2 = (x2i, . . . , X2n) and y = (?/i, 2/n)- Fur- 
thermore, ii Wi = W (Vi = 1, . . . ,n) for some W : y, the channel is 

called stationary and memoryless and is denoted by W = {W}. 

Now, for each (Xi,X2,Q) G Jm let us define the region IZwi^i, X2\Q) 
by the collection of all (i?i, i?2) satisfying 



0 < Ri < I(Xi-,Y\X2Q), 


(7.8.3) 


0 < R 2 < I{X2-,Y\XiQ), 


(7.8.4) 


Ri + R2<I(XiX2-,Y\Q), 


(7.8.5) 



where I(Xi;Y\X 2 Q), I(X 2 ;Y\XiQ) and I{XiX 2 ]Y\Q) denote the (condi- 
tional) mutual information (e.g.. Cover and Thomas [17]). Then, we have the 
following theorem: 

Theorem 7.8.1 (Han [36]). For arbitrary input alphabets Afi and X 2 and an 
arbitrary output alphabet y (^ 1 ^X 2 and y can be arbitrary set not restricted 
to finite sets), the capacity region C(W) of a stationary memoryless multiple- 
access channel W — {IF} is given by 

C(W) = Cl( y TZw{Xi,X2\Q)\ , (7.8.6) 

\{XuX2,Q)ejM J 

where Cl(-) on the right-hand side denotes the closure operation and the car- 
dinality of Q can be restricted to at most 3. □ 

Remark 7.8.1. It is easy to check that the set on the right-hand side of 
(7.8.6) is convex and closed. It was Ahlswede [3] who first proved Theo- 
rem 7.8.1 for the case that X\,X 2 and y are finite sets. In this particular case 
we do not need to take the closure on the right-hand side of (7.8.6). □ 

Proof of Theorem 7.8.1, 

The basic idea of the proof consists in computing the right-hand side of 
the general formula (7.7.7) and obtaining the right-hand side of the formula 
(7.8.6) as a special case. 

1) Direct part. 

For any (Xi, X 2 , Q) G Jm {Q ^ {1,2,3}) we define the pair of memory- 
less input processes (Xi,X 2 ) G Sj as 



7.8 Stationary Memoryless Multiple- Access Channels 489 






subject to 



X 



(rO 



and 



X, 



(ri) 



(x) 



(x) = 



-Pxi|q(^|1) 

Pxi\q{x\2) 

Pxi\q{^\^) 



•Px2|q(^|1) 

-Px2|q(^|2) 

■Px2|q(^|3) 



respectively, where we set 



for 

for 

for 



for 

for 

for 



1 < i < ni, 
ni < i < ri2, 
ri2 <i <n, 



1 < i < ni, 
ni < i < ri2, 
ri 2 < i < n, 



ni = \PQ{l)n\, U 2 = r(Po(l) +Po(2))nl 



(notice that this (Xi,X 2 ) does not satisfies the “consistency” (see §1.3) as a 
stochastic process). Denote by 

the channel output corresponding to the input pair (Xi,X 2 ). In view of 
Khintchin’s law of large numbers (see Theorem 1.3.2 in Chapter 1) we have 
the following properties on the convergence in probability: 



1 ni 

— ^log 

1 

— T) log 
-^1 i=tr+i 

1 ” 

^ log 

— 77,0 ^ 



n — U 2 . 



=ri2 + l 



ni 



ni 



^log 



Py(„) 

Py(„, („,(y/”i|xi”i) 

r(n)| ^(n) 



i=l 
ri2 



^ log 

^2 - ,=tr+i 

1 " 

S 1°S 



Py(„)i^(„,(y/"i|xi:*i) 

Py(„,i^(„,(y/"i|xiri) 

p^(.)i;,(.,(y/"i|xi”i) 



7(Xi;y|X2,Q = l), 
7(Xi;y|X2,Q = 2), 
7(Xi;y|X2,Q = 3); 
7(X2;y|Xi,Q = l), 
7(X2;y|Xi,Q = 2), 
7(X2;y|Xi,g = 3); 



490 7 Multi- Terminal Information Theory 






i=l 
ri2 



ri2 - ni 



n — ri2 



J2 






i=m+i 



y; 



(n) 



(y,(n)) 



7(XiX2;y|Q = 2), 






E i°g "E '7.xn)r ^ -/(^i^2;y|Q = 3), 



i=n2 + l 



^(rx) 






where E denotes the channel output corresponding to the input pair (Xi, X 2 ). 
Hence, we obtain 



I(Xi;Y|X 2 )= lim V4”>7(Xi;y|X2,Q = A:), 

n — >^00 ' 

A:=l 

3 

KX 2 ; Y|Xi) = lim V4"^7(X2;y|Xi,Q = fc), 



A:=l 

3 



7(XiX 2;Y)= lim V4")7(XiX2;y|Q = fc), 



(7.8.7) 

(7.8.8) 

(7.8.9) 



A:=l 



where 



a 



(n) 



a. 



(n) '^2 (n) ^2 






n n n 

Since it clearly holds that 

lim = Pqik) (fc = 1,2,3), 

n—yoo 

by substituting this into (7.8.7)-(7.8.9) we obtain 

7(Xi;Y|X 2) =7(Xi;y|X2Q), 

7(X2;Y|Xi) = 7(X2;y|XiQ), 
7(XiX2;Y)=7(XiX2;y|g). 



(7.8.10) 

(7.8.11) 

(7.8.12) 



These equalities mean that for any (Xi,X 2 ,Q) € Jm there exists an 
(Xi,X 2 ) € <5/ satisfying 



7^w(Xl,X2)D7^M/(Xl,X2|Q). 



(7.8.13) 



2) Converse part: 

In order to prove the converse part, we use the following lemma, which is 
a generalized version of Theorem 3.5.2 in Chapter 3 for the conditional case. 



7.8 Stationary Memory less Multiple- Access Channels 491 



Lemma 7.8.1. Consider an arbitrary (not necessarily stationary memory- 
less) multiple- access channel W = where the input alphabets 

and the output alphabet of W can be arbitrary. Then, for any input pair 
(Xi = ,X 2 = {X 2 }^i) dnd its corresponding the channel output 



_ it holds that 




7(Xi; YIX 2 ) < liminf l7(Xi"; Y”|X 2 ”), 

n—^oo n 


(7.8.14) 


7(X2;Y|Xi) < liminf l7(X2";Y”|Xn, 

n-^oo u 


(7.8.15) 


7(XiX 2; Y) < liminf l7(Xi”XJ; Y'^). 

n— »oo n 


(7.8.16) 



Proof. Though this lemma can be proved similarly to Theorem 3.5.2, we give 
the proof of (7.8.14) for the reader’s convenience. First, set 



IN, ■fV"ix"X"(y|xi,x2) 

i(xi; y|x 2 ) = log o f \ — ^ (^i ^ > ^2 £ <^2 , V € y ) 



Y-\x: 



‘(y|x2) 



(7.8.17) 



for simplicity and let 7 > 0 be an arbitrarily small constant. Then, it holds 
that 

= e(U{X^-Y^\X^)1 



-i(X^;Y^\X^) 



<o]} 



+ E{-i{X'^-,Y^\X^)l 



+ Ei-i(X?-,Y^\X^)l 

n 



0 < -i(Xi”;y«|X 2 ”) < 7 (Xi;Y|X 2 ) -7 



I(Xi; YIX 2 ) - 7 < -i(Xr;Y”|XJ) 






+ E 






<0 






7(Xi; YIX 2 ) - 7 < -i{X^-, Y"|X 2 ”) 



Y. ^X"X,"(X1,X2) 

(Xi,X2)GA'[''X 

X E{i(xi; Y”|x 2)1 [i(xi; Y"|x 2 ) < 0]| Xf = xi.XJ = X2} 



■E 






Y"|XJ)1 



7 (Xi;Y|X 2 ) - 7 < -i{X^-,Y^\X^) 



(7.8.18) 



492 7 Multi- Terminal Information Theory 



By applying Lemma 3.2.4 to the first term on the right-hand side of (7.8.18), 
we obtain 

> — logl 

ne e 

+ E I [/(Xi; YIX2) - 7 < U{X'i-Y'^\xA | 

\ Tl Ti \ 

1 , 1 

> — log - 
ne e 

+ (/(Xi; YIX2) - 7)Pr Y"|X2") > 7(Xi; YIX2) - 7} • 



(7.8.19) 



On the other hand, due to the definition of 7(Xi; YIX2) we have 

lim Pr ( > 7(Xi; YIX2) - 7I = 1- 

n^oo n J 

Hence, by taking lim inf of both sides of (7.8.19), it follows that 



liminf-7(Xr;Y”|X2”) >7 (Xi;Y|X2) -7. 

n— ^00 n 

Since 7 > 0 is an arbitrarily small constant, we finally obtain 
liminf l7(Xi”;Y"|X?) >7(Xi;Y|X2) 

n— >cxD n 



by letting 7—^0. 


□ 


Now, let us return to the proof of the converse part. Since we consider a 
stationary memoryless multiple-access channel W = {W}, it holds that 


7(Xr;Y"|X2”) < X^7(Xi7;y/")|x(f ), 

Z=1 


(7.8.20) 


7(X2”;F”|Xn < X^7 (x(");F7>|Xi7), 

7=1 


(7.8.21) 


7(XrX2";Y") < 

7=1 


(7.8.22) 


where 

— i^ll 5 ' ■ ■ b 

^2 — V^21 5 ‘ * 5 ^2n b 

Y” = (7”\---,yW). 





7.8 Stationary Memory less Multiple- Access Channels 493 



Then, (7.8.20)-(7.8.22) and Lemma 7.8.1 imply 




7(Xi; YIX 2 ) < liminf 1 

n — >^00 Tl ‘ ^ 
i=\ 


(7.8.23) 


I{X 2 ; Y|Xi) < lim inf 1 

n— >cxD 77, ‘ 

2=1 


(7.8.24) 


7(XiX 2;Y) < liminf 

n—^oo Ti * 

2=1 


(7.8.25) 


If we define the random variables by Pq(t,)( 

l,...,n) and given - 

(7.8.25) can be expressed as 


i) = 1/n {i = 
i , (7.8.23)- 


7(Xi; YIX 2 ) < liminf7(x}”^Y(")|X^”^Q(”)), 

n—^oo 


(7.8.26) 


KX 2 ; Y|Xi) < liminf7(X^”^; 

n^oo 


(7.8.27) 


KX 1 X 2 ; Y) < liminf7(X^"^X^”^ Y(”^|Q(”)). 

n—^oo 


(7.8.28) 



Then, for an arbitrary small 5 > 0 there must exist an n satisfying 

7(Xi; YIX 2 ) < + S, 

KX 2 ; Y|Xi) < 7(X^”^ + 5, 

7(XiX 2;Y) < + y 

That is, there exists an (Xi,X 2 ,Q) G Jm satisfying 

7(Xi;Y|X 2) < /(Xi;y|X2Q) + ^, (7.8.29) 

/(X 2 ; Y|Xi) < /(X 2 ; Y|XiQ) + (5, (7.8.30) 

/(X 1 X 2 ; Y) < /(X 1 X 2 ; Y\Q) + (7.8.31) 

where Y denotes the channel output corresponding to the input pair (Xi, X 2 ). 
Notice here that from Eggleston’s theorem [24] we can assume without loss 
of generality that Q satisfies Q G {1,2,3}. By noticing that the set on the 
right-hand side of (7.8.6) is closed and (5 > 0 can be arbitrarily small, (7.8.29)- 
(7.8.31) mean that for any (Xi,X 2 ) G Sj we have 

7^w(Xl,X2)cCl j IJ nw{XuX2\Q)Y (7.8.32) 

\iXuX2 , Q)€Jm j 



494 7 Multi- Terminal Information Theory 



Example 7.8.1 (Nonstationary memoryless multiple- access chan- 
nel). Though the way of developing Theorem 7.8.1 from Theorem 7.7.1 
is not so simple, the approach from the general multiple- access channel con- 
tains advantage such that the approach provides knowledge to some extent 
on the capacity regions of many channels to which we cannot apply stan- 
dard methods established in previous studies. Let us consider the following 
multiple- access channel W without stationarity as an example. Here, we as- 
sume that the input alphabets A'l and ^2 the output alphabet 3^ of 
the channel W are finite sets so as to avoid being too complicated. This 
assumption is required so that we can use Chebyshev’s inequality instead 
of Khinchin’s law of large numbers. Now, let us define the nonstationary 
memoryless multiple-access channel W = {W'^ = 11^=1 



Wi = 



I 

1^2 



for i e J, 
for i ^ J, 



where W \ , W 2 : A:'i x A 2 — ^ 3^ are some fixed probability transition matrices 
and we set 



J = {i| 2^'=-^ < i < 2^*, A: = 1, 2, • • •} 

= {2, 3, 8, 9, 10, 11, 12, 13, 14, 15, 32, 33, 34, 35, ■ • . 



By analyzing the way of developing Theorem 7.8.1 from Theorem 7.7.1 
in detail, we can see that the capacity region C(W) of this W is given 
in the following way. First, let be the collection of all quintuples 
, xf \ Xp^ , Q) satisfying 

Q IQ IQ ’ 



Next, for each {x[^\ X 2 ^\x[^\X 2 ^\q) G we define the corresponding 
the outputs and by 

P;^a);^(i)g(a:i,X2,^)TFi(^i|xi,X2), 

P^p)x^^^y(2)Q (^1 ’ ^2? 2/2? P^(2) js^(2)g(3Ji, X2, ^)kF2(2/2|^i? ^ 2 )? 

respectively. Furthermore, for each {x[^\ X^^\ x[^\ X^\ Q) G and | < 
A < I we define the region 

Uw, (Xp) , Xp) , Xp^ , Xp> IQ, A) 

as the collection of all (Pi,P2) satisfying 

0 < i?i < A/(Xp^yi)|Xp^Q) + (1 - A)7(XP;y(2)|j^Wg^^ (7.8.33) 

0 < i?2 < A7(Xp^y(i)|Xp^Q) + (1 - A)7(Xp^y(2)|j5j^(2)g^^ (7.8.34) 



7.9 Mixed Multiple- Access Channels 



495 



i?i + i?2 < + (7.8.35) 

Notice here that we have 

liminf — | J fl {1, 2, • • • , n} | = i 
n —^00 Ti 3 

1 2 

limsup-|J n{l,2, = - 

n^oo ^ ^ 

(see Remark 3.2.3 in §3.2). Then, we can conclude that the capacity region 
is given by 

C(W) 

= n U nw^,wAx[^\x^^\x[^\xP\Q,x), 

i<A<f 

(7.8.36) 

where the number of values the cardinality of Q can be restricted to at most 
3. 

By using the same argument, we can easily see that the capacity re- 
gion C(W) of the nonstationary memoryless multiple-access channel W = 
{n”= 1 finite input and output alphabets coincides with the for- 

mula (7.8.6) if {Wi]^i converges to a probability transition matrix W. □ 



7.9 Mixed Multiple- Access Channels 

In the preceding section we have shown that the well-known formula for 
the capacity region of stationary memoryless multiple-access channels (The- 
orem 7.8.1) can be obtained from a general formula (Theorem 7.7.1) by direct 
computation. In this section we consider the problem finding the capacity re- 
gions of multiple- access channels that conventional standard methods do not 
yield their capacity regions. One of the simplest and most familiar of such 
channels is a mixed channel. 

Letting Wi = and W 2 = arbitrary multiple- 

access channels (not necessarily stationary ergodic) with arbitrary input and 
output alphabets, we call the multiple- access channel W = defined 

by 

TT^(y|xi,X2) = aiTTf(y|xi,X2) +a2W^(y|xi,X2) (7.9.1) 

the mixed channel of multiple-access channels Wi and W 2 , where a\ and Q 2 
are constants satisfying ai > 0, Q !2 > 0 and ai + 0:2 = 1- 

We need some preparations for obtaining a formula of the capacity region 
of the mixed channel. We consider two triplets of two input processes and 
one output process 



496 7 Multi- Terminal Information Theory 



and define their mixed process 

(Xi,X2,Y) = {(Xr,X2",Y")Ci 

by 

■Pxi"X'‘y-(xi,x2,y) 

(1)^1 (XJ j X2 ; y ) “f“ ^(2)n y^-^2)n (Xl j X2 ) y ) ) 

where xi € Y”, X 2 € and y € y". We define the (conditional) inf-mutual 
information rates such as 

7(X^i^^Y(i>|X^^'), 7(xfbY(2)|X^^') 

similarly to(7.7.1)-(7.7.3). Then, we have the following lemma corresponding 
to Lemma 3.3.1 in Chapter 3. 

Lemma 7.9.1. 

7(Xi; Y|X 2 ) = min (7(X^'b Y^i^lX^), 7(xfb y(2)|xf )) , (7.9.2) 

7(X2; Y|Xi) = min (l(X^'b Y(D|xf)), 7(xf ; Y(2)|xf^)) , (7.9.3) 

KX 1 X 2 ; Y) = min (kX^'^xW; Y(D), /(xf^xf ; Y'^))) . (7.9.4) 

Proof. In order to prove (7.9.2) we use the identity * 

Py^.|xrx?(l^”l^r,^2”) 

Py.|xj(^"l^2") 

= log Pxrx^‘y>* (xr, X 2 ", Y") + log Px" (XJ) 

- log Px-xi^ (Xl”, X 2 ”) - log Pxjyn (X 2 ”, Y”) 

and evaluate each term on the right-hand side similarly to the proof of 
Lemma 1.4.1. We can prove this lemma by combining inequalities correspond- 
ing to (1.4.8) and (1.4.11). We note, however, that p- limsup and max must be 

n— >00 

replaced with p-liminf and min, respectively. Equations (7.9.3) and (7.9.4) 

n— »oo 

can be proved in the same way (recall the comment given in the proof of 
Lemma 3.3.1 as well). □ 

If in Lemma 7.9.1 we define the outputs = 

|y( 2 )n|^^ from multiple-access channels Wi and W 2 corresponding to the 
input pair 



Notice that we are using here too the symbolic expression as in (3.3.2) 



7.9 Mixed Multiple- Access Channels 



497 



(X^,X«) = (xf\xf ) = (Xi = {xr}“ 1,X2 = i) 

by 

-Pya)"|X”X”(y|xi,X2) = Wi”(y|xi,X2), 

^y(2)»|Xi‘xy(y|xi,X2) = W"2"(y|xi,X2), 

respectively, Theorem 7.7.1 implies that the following general formula for the 
mixed channel W of two multiple-access channels Wi and W 2 : 

Theorem 7.9.1 (Han [36]). Let A’l and X 2 he arbitrary input alphabet and y 
an arbitrary output alphabet (Xi,X 2 and y can be arbitrary (not restricted to 
finite) sets). The capacity region C(W) of the mixed multiple- access channel 
W of two multiple- access channels Wi and W 2 defined by (7.9.1) is given 



by 

C(W)= U 7^^^\w.(Xl,X2), (7.9.5) 

(Xi,X2)€5, 

where X 2 ) is the collection of all (i?i,i? 2 ) satisfying 

0 < i?i < min (z(Xi; Y(D|X 2 ), /(Xi; Y(2)|X2)) , (7.9.6) 

0<R2< min (/(X 2 ; Y(i)|Xi), /(X 2 ; Y(2)|Xi)) , (7.9.7) 

i?i + i ?2 < min (/(X 1 X 2 ; Y(i>), /(X 1 X 2 ; Y^2))) . (7.9.8) 

Next, let us apply Theorem 7.9.1 to the mixed multiple-access channel 
W of two stationary memoryless multiple-access channels Wi = {Wi} and 



W 2 — {W 2 }. In order to obtain the capacity region (7.9.5) of such W, for 
each (Xi, X 2 , Q) G Jm we define the corresponding output variable and 
(taking values in T) by 

^XiX2y<i)Q(^y^2,yi,£?) = PQ{q)Pxi\Q{xi\q)Px2\Q{x2\q)Wi{yi\xi,X2), 

PxiX2YmQ{xi,x2,y2,q) = PQ{q)Px^\Q{xi\q)Px2\Q{x2\q)W2{y2\xi,X2). 

Then, we obtain the following theorem giving the capacity region of the mixed 
channel expressed in a “computable” form. 

Theorem 7.9.2 (Han [36]). Let Xi and X 2 be arbitrary input alphabets and 
y an arbitrary output alphabet (Xi,X 2 and y can be arbitrary (not restricted 
to finite) sets). The capacity region C(W) of the mixed multiple- access chan- 
nel W of two stationary memoryless multiple- access channels Wi = {Wi} 
and W 2 = {W 2 } is given by 

U 7^S,w2(^i-^2ig) 

{X^,X2,Q)€JM 



C(W) = Cl 



(7.9.9) 



498 7 Multi- Terminal Information Theory 



where X 2 IQ) denotes the collection of all (Ri,R2) satisfying 

0<Ri< min (/(Xi;y(i)|X 2 Q), I{Xi-,Y^^^\X 2 Q)) , (7.9.10) 

0 < i ?2 < min (^I(X 2 -,Y^^^\XiQ), I(X 2 -,Y^^^XiQ)^ , (7.9.11) 

Ri+R2<mm(^I{XiX2-,Y^^^\Q), /(X 1 X 2 ; r^^^lQ)) (7.9.12) 

and the cardinality of Q can be restricted to at most 6 ( cf Eggleston’s theo- 
rem [24]). □ 

Remark 7.9.1. The set on the right-hand side (7.9.9) is convex. □ 



Proof of Theorem 7.9.2. 

This theorem can be proved by using the same argument, which starts 
from Theorem 7.9.1, developing Theorem 7.8.1 from Theorem 7.7.1 (cf. § 7.8). 

□ 

The mixed multiple-access channel above can be generalized in the fol- 
lowing way. Given infinitely many arbitrary general channels 

Wi = {Wf : X X^ ^ 1 (i = 1, 2, . • .), 

the multiple- access channel W = defined by 

CO 

lF™(y|xi,X2) = ^aiW^(y|xi,X2) 

i=l 

(Vn = 1 , 2 , • • • ; V(xi, X 2 ) € X^ x X^,yy € 

(7.9.13) 

is called the mixed channel of a multiple-access channel family 
where (i == 1, 2, • • •) are constants satisfying 

CO 

= l (ai>0; Vi = l,2,---). 

i=l 

We prepare the following lemma for developing the capacity region for this 
mixed channel. Letting be an arbitrary probability distribu- 

tion over A'f x X2 x y'^ for each i = 1, 2, • • •, we define another probability 
distribution Px^x^y^- by 

CO 

-Px;‘Xjy"(xi,X2,y) = (7.9.14) 

i=l ^ ^ 



Then, it holds that 



7.9 Mixed Multiple- Access Channels 



499 



oo 

i=l 

oo 

Px'*(X2) = ^aiP^(i)„(X2), 
i=l 

oo 

Pyr^iy) = ^aiPy(i)n(y) 

2=1 



(7.9.15) 

(7.9.16) 

(7.9.17) 



etc. Setting 

(xf\x«,Y«) = ^ (i = 1,2,...), 

(Xi,X2,Y) = {(xf,X2”,r”)}~i, 

we have the following lemma on the spectral (conditional) mutual information 
rates, which is a generalized version of Lemma 3.3.2 in Chapter 3. 

Lemma 7.9.2. 

J(Xi; YIX 2 ) = . inf /(Xf^ Y«|X«), (7.9.18) 

^:ai>0 

7(X2;Y|Xi)= inf I(x('^Y«|xf)), (7.9.19) 

2:o:i>0 

I(XiX 2 ;Y)= inf 7(xf>X^*^Y«). (7.9.20) 

i:ai>0 



Proof. This lemma follows from computation of the information-spectra sim- 
ilar to Lemma 1.4.3 (§1.4) and Lemma 3.3.2 (§3.3). □ 

If for each i = 1, 2, • • • we set = Xi and X^^^ = X 2 in Lemma 7.9.2 
and denote by the output of the multiple-access channel corre- 
sponding to the input pair (Xi,X 2 ), Theorem 7.7.1 immediately yields the 
following theorem: 

Theorem 7.9.3. The capacity region C(W) of the mixed multiple- access 



channel W defined by (7.9.13) is given by 

C{W)= U 71}’^^j^_^(Xi,X2), (7.9.21) 

(Xi,X2)€5, ‘ ' 

w/ierePj’^^j^^(Xi,X2 ) is the collection of all (i^i,i? 2 ) satisfying 

0<Ri< inf 7 (Xi;Y(*)|X2), (7.9.22) 

i:ai>0 

0<R2< inf 7(X2 ;YW|Xi), (7.9.23) 

2:o:i>0 

Pi + P 2 < inf 7 (XiX2;Y(*>). (7.9.24) 

i:ai>0 



500 7 Multi- Terminal Information Theory 



Example 7.9.1. Let 

z, = {zf = (4”^ • • • - (* = 1’ 2, • • •) 

be infinitely many general sources with the source alphabet Z = {0, 1}. For 
each i = 1,2, •• • define, similarly to Example 7.7.1, the “additive” multiple- 
access channel with as the noise process. Denote by W the mixed 
multiple-access channel of these infinitely many multiple- access channels 
{% = 1,2,---). Then, by applying the argument described in Example 7.7.1 to 
each term on the right-hand sides of (7.9.22)-(7.9.24) in Theorem 7.9.3, the 
capacity region C(W) of the mixed channel W turns out to be 



C(W) 

= >0,i?2 >0,iii+ii2 <log2- supH(Zi)l. (7.9.25) 

t i\OLi>Q J 

This result can also be obtained from the fact that the multiple- access channel 
W defined above is equivalent to the additive multiple- access channel W* 
with the noise process Z = defined by 

oo 

Pz«(z) = ^Q!iPzj.(z) 
i=l 



(see Example 7.7.1) and the property 
H{Z) = sup H{Zi) 

i:cxi>Qi 

given in Lemma 1.4.3 in §1.4. □ 



Remark 7.9.2. Similarly to §3.3 in Chapter 3, we can define the com- 
pound channel W = of infinitely many multiple-access chan- 

nels Wi (i = 1,2,---). We can see that the capacity region C(W) = 
C of the compound channel coincides with the capacity region 

C(W) = ^ of a mixed multiple- access channel with 0 ;^ > 0 

for alH = 1,2, ■ given in Theorem 7.9.3 (see Theorem 3.3.5 in Chapter 3 

as well). In particular, the capacity region of the compound channel of two 
stationary memoryless multiple-access channels = {Wi) (i = 1,2) coin- 
cides with the capacity region given in Theorem 7.9.2 (consider the case that 
ai > 0, 0(2 > 0 and = 0 for alH > 3). □ 



7.10 Proof of Theorem 7.7.1 

We give the proof of Theorem 7.7.1 (see §7.7) in this section. 
1) Direct part: 



7.10 Proof of Theorem 7.7.1 501 



We prove this part by using the following lemma that is the multi-user 
version of Lemma 3.4.1 in §3.4. 

Lemma 7.10.1. Let (Xi = 5 X 2 = {^ 2 )^ 1 ) arbitrary pair 

of channel inputs satisfying (Xi,X 2 ) G Sj. Denote by Y = the 

output from a multiple- access channel W = {W'^ : x X^ cor- 

responding to (Xi,X 2 ). Then, for arbitrary given positive integers and 
Mn^ there exists an ,Sn)- code satisfying 



^ f 1 W^(Y^\X?,X^) ^ 1 , , .n 



or 






1 W^(Y^\X?,X^) 1 , 

n Py„(yn) ^ + 7 U 3e- 



for all n = 1 , 2 , • • where 7 > 0 zs an arbitrary constant. 



nj 



(7.10.1) 



Proof. For an arbitrary constant 7 > 0 we define three sets T", T^, C 
X X^ X by 



pn 



(xi,X2,y) 



1 lF»(y|xi,X2) 
n Prn|xj(y|x 2 ) 




(7.10.2) 



T m 
2 



(xi,X 2 ,y) 



1 iy»(y|xi,x 2 ) 

Py"|Xi"(y|xi) 



> - log + ' 

Th 



(7.10.3) 



T in 

3 



(xi,X 2 ,y) 



1 . _ w^"(y|xi,x 2 ) 

n ® Pyn(y) 



>llog(M«M^ 2 ))_^^| 



(7.10.4) 



and set 



Tn ^ J.U p, J.U p (7.10.5) 

a) Generation of a random code: 

First, let us recall that in §7.6 the input processes and the output process 
of a multiple- access channel W are denoted by 

Xi = WC 1 ,X2 = {XJ}~ 1 , Y = {Y"}~ 1 . 

For an arbitrary given pair of input processes 

(Xi = {YrCi,X2 = {Y2«C=i)€5r 



502 7 Multi- Terminal Information Theory 



we independently generate codewords ui,---,u^(i) G subject to 

the probability distribution Px^ • In the same way, we independently generate 
codewords Vi, • • • , v^( 2 > G ^2 subject to the probability distribution 
. We set 

Cl = |ui, • • • , u^(i) I , (7.10.6) 

C 2 = {vi,---,Vj^^(2)}. (7.10.7) 

b) Encoding: 

An encoder transforms a message i G Mn^ = |l, 2, . . . , Mn^^| into 

a codeword = ^n\i) and inputs to the input terminal 1 of the chan- 
nel. Simultaneously, an encoder transforms a message j G A4n^ = 

|l,2, . . . into a codeword Vj = ^n\j) and inputs Vj to the input 

terminal 2 of the channel {random coding). 

c) Decoding: 

After receiving a y G a decoder xjjn searches for a pair of messages 
{ij) satisfying 

(ui,v^-,y) G T^. (7.10.8) 

If such (i, j) is unique, the decoder outputs i^n{y) = (hj)- If there exists no 
such (i, j) or exist more than one such (i, j), the decoder outputs an arbitrary 

d) Evaluation of the error probability: 

Let (u^,Vj) be a pair of codewords that are actually inputted to the 
input terminals 1 and 2 of the channel, respectively, and y a channel output 
corresponding to (u^, v^). The decoding error occurs only if 

(7.10.9) 

or 

(ui',Vj/,y) € T" (7.10.10) 

for some with Define the event Ei-i by 

Eki = {(ufe,v/,y) € T”}, 

Then, the ensemble average of the error probability with respect to the 
random code, is upper-bounded as follows: 



7.10 Proof of Theorem 7.7.1 503 












+ 






E EP>^{^y} 

i=l j=l 

EEp' u 



-'i'j' 



i=l j=l 






(7.10.11) 



Then, due to the symmetry of the random code, (7.10.11) can be expressed 
as 



£„ < Pr {E^,,} + Pr ^ U Ei,j> \ , (7.10.12) 

where y denotes the channel output corresponding to (ui, vi). Hereafter, let 
us evaluate the right-hand side of (7.10.12). Since (ui, Vi, y) is subject to the 
probability distribution the first term on the right-hand side of 

(7.10.12) can be rewritten as 

An - Pr {El^} = Pr {X^X^Y^ ^ T^} . (7.10.13) 

On the other hand, the second term on the right-hand side of (7.10.12) can 
be evaluated as 



Pr I U Eer 

+ E (7.10.14) 

i'^1 

We evaluate the first term on the right-hand side of (7.10.14) in the following 
way. Since u^/ {i' ^ 1) and (vi,y) are independently generated subject to 
the probability distributions Px^ and respectively, we have 

Pr{£^i'i} = Pr{(ui',vi,y) € T"} 

= E Px"(xi)Px2”y"(x2,y) 

(xi,X2,y)GT^ 

< ^ Px]^(xi)Px^y^^(x2,y) (7.10.15) 

(xi,X2,y)eT{'' 

for i' ^ 1. By noticing 

g-n7 

PxjK"(X2,y) < Px”(X2)Py"|X"Xj(y|xi,X2)-^ 

JYLn 

for (xi,X 2 ,y) G Tf, (7.10.15) leads to 



504 7 Multi- Terminal Information Theory 



^ H Px"(xi)Pxy(x2)Py-|x^X,"(y|xi,X2)-^ 

(xi,X2,y)€T" 



Hence, we obtain 



X^Pr{P«a}<(M«-l) 



g-n7 



<M« 



g-n7 



< e“”^. 



Similarly, we also obtain 
;^Pr{£;i,.}<e-”\ 



(7.10.16) 

(7.10.17) 



^ Pr{£;yjv} < e“"^. (7.10.18) 

Then, the combination of (7.10.14) and (7.10.16)-(7.10.18) yields 

Pr i IJ Ei-j- I < 3e"”'>'. (7.10.19) 

[(*',107^(1, 1) j 

By substituting (7.10.13) and (7.10.19) into (7.10.12), we obtain 
< Pr {X^X^Y^ i T^} + 

Consequently, there must exist at least one deterministic {n, Mn\ Mn\ Sn)- 
code satisfying the condition of the lemma. □ 



Now, we are ready to prove the direct part. It suffices to prove that an 
arbitrary (i^i,i^2) satisfying 

0<i7i </(Xi;Y|X 2), (7.10.20) 

0<i^2 <I(X2;Y1 Xi), (7.10.21) 

+ i^2 < /(X1X2; Y) (7.10.22) 

is achievable. To this end, for an arbitrary (i^i,i^2) satisfying (7.10.20)- 
(7.10.22) and an arbitrarily small constant 7 > 0, we define 

mW = e«(Hi-27)^ 



(7.10.23) 



_ ^n{R2-2-f) ^ 



7.10 Proof of Theorem 7.7.1 505 

(7.10.24) 



Then, Lemma 7.10.1 guarantees the existence of an {n, Mn\ M^\ en)-code 
satisfying 



Bn < Pr< 



n ® Py.|x?(P”|X2") - ' ^ 



1 VP«(y«|Xf,X2”) ^ ^ 

“ n Py„|;,n(r"|xr) - " ^ 



1 

or — log 
n 



Py»(T«) 

-^Mn'°®Py.|xy(^”l^2) 



^ Pi T P2 — 3y j T 3e 

</(Xi;Y|X2)-7 



1 , W^{Y^\X^,X^) ^ . 

or - log 
n 



1 



Py-(Y") 



<7(XiX 2;Y)-37| +3e-"^ 

<i(Xi;Y|X2)-7| 



-^Mn^°®Py„|xy(Y»|X2") 



1 lP"(yn|Xf,X2”) / , 



+ Pr 









(7.10.25) 



Since the definitions of the spectral (conditional) inf-mutual information rates 
imply that all the terms on the right-hand side of (7.10.25) converge to zero 
as n — ^ oo, we have 



lim Sn 

n—^oo 



0 . 



The claim of the direct part follows by recalling that 7 > 0 can be arbitrarily 
small and the capacity rate region forms a closed set. 



2) Converse part: 

If the multiple-access channel is stationary and memoryless, the converse 
part can easily be proved by using the Fano inequality (cf. Cover and Thomas 
[17]). However, when we consider a general multiple-access channel treated in 
Theorem 7.7.1, the Fano inequality does not give a sufficiently tight upper- 
bound on the error probability. We use the following lemma instead of the 



506 7 Multi- Terminal Information Theory 



Fano inequality, which is an generalized version of Lemma 3.2.2 in Chapter 3 
for the multi-user case. 



Lemma 7.10.2. LetW = {W'^ : A'f x be an arbitrary multiple- 

access channel. Then, for an arbitrary constant j > 0 all the (n, Mn\Mn\sn)- 
codes satisfy the inequality 



> Pi* 



1 ^ ^ 1 ^ ^ .2) 



-7 



1 1 

or — log - 
n 



Py.(y«) - ^ ; 



-3e‘ 



(7.10.26) 



for all n = 1,2,‘ Here, letting and be the encoders, we denote by 
Xi and X 2 the input variables subject to the uniform distributions over 

Cl = |ui, U2, • • • , Uj^^(i) I (Ui = 

C 2 = {vi , V2, • • • , V^(2) } {Vj = (j)), 

respectively, and Y'^ the channel output corresponding to (Xp,X 2 ). 



Proof. Though basic idea for proving the lemma can be found in Lemma 3.2.2, 
we give the proof here for the reader’s convenience. Set (3n = e~'^‘^ and note 
that 

iy"(y|xi,X2) _ Cx"'|y"X’‘(xi|y,X2) 
fy-|X"(y,X2) ■PX”|X,"(X1|X2) 

= M^^'-Pxriy'‘xj(xi|y,x2), 



iy»(y|xi,X2) ^ i^X"|r"X"(x2|y,xi) 
-Py"|X”(y.xi) i=xy|x;*(x2|xi) 

= -Hl^)Px"|y"Xi"(x2|y,xi), 



iy"(y|xi,X2) ^ fx;»x>‘|y"(xi,X2|y) 

^y'“(y) Px;*xj(xi,x2) 

= Px-x-' |y« (xi , X 2 |y). 

Define three sets in ^ , Ln'’ , Ln^ C A”" x A'l* x y" by 



7.10 Proof of Theorem 7.7.1 507 



= {(xi,X2,y)|PA:i*|y"X2"(xi|y,x2) < /3„| , 
= {(xi,X2,y)lPxj'|y"x;‘(x2|y,xi) < , 

i-n' = {(xi,X2,y)|Px-xj|y"(xi,X2|y) < / 3 „| 

and set 



L„ = pWuP(,2)ui^)_ (7.10.27) 

Then, (7.10.26) can be expressed as 

Px - x - yr ^- { Ln } < Sn ^ Spn - (7.10.28) 

Hereafter, we establish (7.10.28). We denote by Vij the decoding region for 
a pair of messages (u^,Vj) and set 

= (y e y’^\Px"\Y^‘X" (ui|y,vj) < /?„! , 

Bif = (y € >^"|Px"|y"X7 (vj|y,Ui) < /3„| , 

B<if = {y € (u*,v,|y) < /?„} . 



Then, it follows that 
PX]^X^Y^ {Pn} 

= E E U B^f U Pjf ) 

i=l j = l 
i=l j = l 

+ E E V,-, (PW U Pjf U Plf ) n A,) 

2=1 j=l 

< E E 

2=1 j = l 

+ E E -Px7X”y"(ui, vj, (P|]> U U ) n Pij) 

2=1 3 = 1 

== ^" + E E ■Pxi‘X"y” (ui, Vj, (pfj^ u u B^^)) n Vij) 

2=1 j=l 



508 7 Multi- Terminal Information Theory 



< ^ ^ ■Px”X2"y"(ui,Vj,s|j^ f^Vij) 

i=i j=i 

M,</) 

+ E E n%) 

i = l j = l 
2=1 j = l 

M^2) 

< + /?n E E E Vj) 

2=1 j=l yGX>ij 

M^2) 

+^« E E E 

2=1 j=l yeVij 

M/2) 

E E E 

2=1 j = l yGX>ij 









= ^n + /5n A--X2- S IJ Aj I X {V^} 

j=i [ V ^=1 

+ /^n ^ Py-X^ S ( U ^ 

i=l ( \j=l 

M^2) 

u u 

2=1 j = l 

<en + PnJ2 + /?n E + ^nPrAy^ 

j=l 2=1 

— ^n T 

which completes the proof of the lemma. 



□ 



Now, we give the proof of the converse part. Suppose that a rate pair 
(Ri,R 2 ) is achievable. Then, due to the definition of the achievability, for 
any constant 7 > 0 there exist an {n, Mn\ M^\ Snycode satisfying 



IlogM« >i?i- 7 , 
n 



(7.10.29) 

(7.10.30) 



7.11 £:-Coding for Multiple- Access Channel 509 



for all sufficiently large n and Sn 
and an output process 



0 as n ^ 00 . We define input processes 



Xi = ,X2 = {xjCi , Y = {y”} 



n=l 



on this code as are defined in Lemma 7.10.2. Clearly, (Xi,X 2 ) G Sj. By 
substituting (7.10.29) and (7.10.30) into (7.10.26) in Lemma 7.10.2, we can 
obtain 


£n > Pr 1 


1 1 W-(Y"IX!,XS) 


(7.10.31) 


£n > Pr 1 




(7.10.32) 


> Pr j 




(7.10.33) 



Now, let us assume that (7.7.4) does not hold. Then, we can choose a suffi- 
ciently small 7 > 0 satisfying 

Ri-^1> I(Xi; YIX 2 ). (7.10.34) 

By substituting (7.10.34 ) into the right-hand side of (7.10.31), it follows that 



£n > Pr 



1 W»(Y^\X^,X^) 

n Py.„|xj(y”|X2") 



</(Xi;Y|X2)+7 



3e-"'>'.(7.10.35) 



However, since Sn 0 and — > 0 as n — > oo, (7.10.35) contradicts the def- 

inition of the spectral conditional inf-mutual information rate 7(Xi; YIX 2 ). 
This means that (7.7.4) must hold. Similarly, we can prove (7.7.5) and (7.7.6) 
by using (7.10.32) and (7.10.33), respectively. □ 



7.11 s-Coding for Multiple- Access Channel 

So far we have considered coding of multiple- access channels subject to the 
requirement that the error probability £n satisfies lim £n = 0. In this section 

n— »oo 

we weaken this requirement; we consider coding of general multiple- access 
channels 

w = {w ^ : X -> 

subject to the requirement 
lim sup Sn ^ s 

n— )-oo 

for an arbitrarily fixed 0 < £ < 1. We can expect increase of the capacity rate 
region under this weakened requirement on the error probability. 

We begin with giving definitions. 



510 7 Multi- Terminal Information Theory 



Definition 7.11.1. 



Rate pair {Ri,R 2 ) is ^-achievable 

There exists an {n, Mn\ Mn\ 6n)-code satisfying limsupe^ < e, 

n^oo 

liminf — logM^^^ > Ri and liminf — logM^^^ > R 2 . 

n— >00 71 n—*oo 77, 

Definition 7.11.2 (e-capacity region). 

C(e|W) = {{Ri, R 2 ) \ Ri ^ R 2 > 0 and (Ri, R 2 ) is e-achievable} . 



In order to determine C(e|W), letting Xi = {Xf and X 2 = {^ 2 }"^=! 
be arbitrary general sources satisfying (Xi,X 2 ) G <S/, we define the output 
Y = of a multiple-access channel W = with (Xi,X 2 ) as 

the input pair by 

•Px"X2"y”(xi,x2,y) = Px"(xi)Pxj(x2)W^"(y|xi,x2). 

Furthermore, we define a function J(Ri, R 2 IX 1 , X 2 ) with two variables by 



Definition 7.11.3. 

J(Ri, jR 2 |Xi,X 2 ) = limsupPr 



^ PYr.\x^{Yn\X^) - ' 



n^oo 



}' 



(7.11.1) 



which directly depends on the distributions of the (conditional) mutual in- 
formation spectra. Then, we have the following theorem that is an extended 
version of Theorem 3.4.1 in Chapter 3 to the multi-user case. 



Theorem 7.11.1. /Han [36]/ The e-capacity region C(£:|W) of a multiple- 
access channel W is given by 



C{e\W) 

= U C1({(Pi,P2)|J?1>0,P2>0,J(Pi,P2|Xi,X2)<£}), 

(Xi,X2)€5/ 

(0<V£:<1) (7.11.2) 

where Cl(-) on the right-hand side denotes the closure operation. 



Proof. 

1) Direct part: 



7.11 e-Coding for Multiple- Access Channel 511 



Consider an arbitrary input pair (Xi,X2) G Sj. For an arbitrary rate 
pair (Ri,R 2 ) satisfying 

{Ri,R 2 ) e Cl({(i?i,i?2) I i?l > 0,i?2 > 0, J?2|Xi,X2) < £})(7.11.3) 

and an arbitrarily small constant 7 > 0 define 

MW = e«(Ri-27)^ 

J^(2) _ gn(R2-27)^ 



Then, Lemma 7.10.1 guarantees the existence of an {n, Mn\ M^\ Sn)-code 
satisfying 



^ fl, „ 

£n < Pr< - log iVn\Yn\ ^ -^1 “ 7 

[n Pyr.|x»(T”|A^) 

1 W^{Y^\X^,X^) 

n Py„|x^(y«|Xf) - " ^ 



or 



1 TT"(y"|Xf,X?) 

Py.(Y-} 



< Rl + R2 - 37 J + 3e-" 



I 



„ fi, H'“(y“|A:,“,x,“) „ 



or 



.w. ® D /^XnA Vn\ 



-7 



n ^ Py„|x;-(y"|XI‘) 



< P2 -7 



1 iy»(y«|Xi",X?) „ 

“ n W.(y) ^ 



27^+36-"^, 



(7.11.4) 



which implies 



limsup£„ < J(Pi — 7,P2 — 7 |Xi,X 2) < £. (7.11.5) 

n—^oo 

Since 7 > 0 is arbitrarily small, (7.11.5) means that any rate pair (jRi,i?2) 
satisfying (7.11.3) is e-achievable. 



2) Converse part: 

Suppose that a rate pair (i?i,i?2) is e-achievable. Then, due to the def- 
inition of the e-achievability, for any constant 7 > 0 there must exist an 
(n, Mn ^ , Mn ^ , £^)-code satisfying 

^logM^^^ > ^1 -7, (7.11.6) 

i logM^^^ > R 2 (7.11.7) 



512 7 Multi- Terminal Information Theory 



for all sufficiently large n and 



limsupen ^ 

n^oo 



(7.11.8) 



Let us define input processes and an output process 

Xi = WC=1 ,X2 = 1 , Y = {Y”}~ 1 

on this code as are defined in Lemma 7.10.2. Clearly, (Xi,X2) G Sj. By 
substituting (7.11.6) and (7.11.7) into (7.10.26) in Lemma 7.10.2, we can 
obtain 



£n > Pr< -log-rH^ — Li; < Ri 



n 



Y-\X, 






-27 

27 



1 W'^{Y'^\X?,XJ}) „ 



■37 






> Pr 



1 PKn(yn|Xf,X?) „ „ 

n Py.|xj(^"|X2") - ^ 



1 iP"(Y»|Xf,X?) „ 

1 lyn/ynl vn 

“ n‘°" pj(y) + 



-47I 



3e 



-ri7 



Hence, in view of (7.11.8) and (7.11.9) we have 
J{R\ — 27,7^2 — 27 |Xi,X 2) < limsup^n- < ^ 

n-^00 

Since 7 > 0 is arbitrary, (7.11.10) means 

(i7i,i72) e Cl ({(7^1,772) I 77i > 0,772 > 0, J(77i, 772|Xi, X2) < e }) . 



(7.11.9) 

(7.11.10) 



Example 7.11.1. Let us apply Theorem 7.11.1 to the additive multiple- 
access channel W in Example 7.7.1 with the noise process Z = 
given in Example 1.6.3 in §1.6. Since we have 

J(Pi,P2|Xi,X2) = ^^L^ 

log 2 

for Xi = and X2 = {^2)^1 with Xf and X 2 subject to the 

uniform distributions over Xf and ^ respectively, the e-capacity region 
C(e|W) is given by 

C(e|W) = {{Ri^ -^2)1^1 ^ 0? R 2 ^ O7 Ri + R 2 ^ ^ log 2} (7.11.11) 

for all 0 < e < 1. 



□ 



7.12 Strong Converse Theorem for Multiple- Access Channels 513 



Example 7.11.2. Let us consider the additive multiple- access channel W 
in Example 7.7.1 with the noise process Z = defined as the mixed 

source of stationary memoryless sources Zi = and Z 2 = {^ 2 )^ 1 ? 

i.e., with Z = defined by 

Pz-(z) = aiPz^iz.) + a2Pzy(z), 

where Zi and Z 2 are subject to the probability distributions Pz^ (0) = 0i and 
^^ 2 ( 0 ) = ^2 (0 < < ^2 < and ai > 0,a2 > 0 and ai + o ;2 = 1- Then, 

from the argument given in Example 7.11.1 it turns out that 

C{s\W) = {{RuR 2 ) I Ri >0,772 >0,i^i+i72 <log2-/i(<92)}, 

(0 < Ve < 0^2), 

C{e\W) = {{RuR 2 ) I i7i >0,772 >0,77i+i72 <log2-/i(^i)}, 

(q2 < V£ < 1). 



7.12 Strong Converse Theorem for Multiple- Access 
Channels 

This section is devoted to description on the strong converse property of gen- 
eral multiple-access channels. This section corresponds to §3.5 in Chapter 3. 

Defiinition 7.12.1. Let C(W) be the capacity region of a multiple- access 
channel W = {W^ : A'f x ^2 y^}r^=i- {R\^R 2 ) satisfying 

(77i,772) ^ C(W) all the {n, Mn\ Mn\ £ri)- codes with 

lim inf — log > R \ , 

n— >00 n 

lim inf — log > R 2 

n— >00 u 

satisfy 

lim €n = 1, 

n— »oo 

the multiple- access channel W is called to satisfy the strong converse prop- 
erty. 

In order to obtain a theorem on the strong converse property, we denote by 
Y = the output from a multiple- access channel W = with 

arbitrary inputs Xi = and X 2 = {-^ 2)^1 satisfying (Xi,X 2 ) G 

Sj. We also define a function of (77i,772) by 



514 7 Multi- Terminal Information Theory 






lim inf Pr 

n— »oo 






n 



Y^\x: 






1 l^n(yn|vn^jj'n) 

” n Py„|x,"(r"|Xf) - ' 



or 



n 



•log 



PyAY^) 



< i?l + J?2 



}' 



(7.12.1) 



which is used instead of i?2|Xi, X2) defined in Definition 7.11.3 in 

the preceding section. Furthermore, we define the two-dimensional region 
7^V(Xl,X2) by 

7^^(Xl,X2) 

= Cl({(i^i,i?2)|i^i > 0,i?2 > 0, J*(i^i,i?2|Xi,X2) < 1}). (7.12.2) 



Then, we have the following theorem on the strong converse property. 



Theorem 7.12.1 (Strong converse theorem: Han [36]). A multiple- 
access channel W satisfies the strong converse property if and only if 

y 7^w(Xl,X2)= U 71 ^(Xi,X2), (7.12.3) 

(Xi,X2)65/ (Xi,X2)6S/ 

where 77.-w(Xi,X2) is the region defined in Theorem 7.7.1 in §7.7. □ 



Remcirk 7.12.1. For an arbitrary input pair (Xi,X2) it always holds that 



7ew(Xi,X2)c7ew(Xi,X2), 



(7.12.4) 



and therefore, 

y 7^w(Xl,X2)c y 7e^(Xi,X2) (7.12.5) 

(Xi,X2)G5j (Xi,X2)G5j 

always holds as well. Equation (7.12.4) can be verified in the following way. 
Consider an arbitrary rate pair (i^i,i^2) ^ ^w(Xi,X2). By noting the in- 
equalities (7.7.4)-(7.7.6) determining 7^w(Xi,X2), for an arbitrarily small 
7 > 0 it follows that 



J*(2Ri-7,i^2-7|Xi,X2) 

I 



= lim inf Pr 

n— >oo 



~ Pyr^lX^iY^lXS) - 



1 lF"(y"|Xf,X?) „ 



or — log 
n 



W^(Y^\X^,X^) 

Py^Yr^) 



< Ri + R2 ~ 27 I 



7.12 Strong Converse Theorem for Multiple- Access Channels 



515 



< limmfPr.^ “log's (vn\Yn\ ^ K^ii ^IXa) - 7 

n^oo l^n Fy.,|xy(P"|Aj-) 

1 , iv”(y”|xr,x?) ^ „ 1 

“ n p,..(y) ' “ "7 

<,i.suppJl.o. '^”(^"iy„;g) </(X.;Y|X.)-4 

n^oo [n Pyr,.|x^K^ 1^2) J 



B„.sapP,jl,o. ;^"(^"iy„g <Z(X.Y|X.) 

„^oo [n Pynixj-lr^lAf) ^ 

+ limsup Pr \ - log 1 ^ KX 1 X 2 ; Y) - 27 L (7.12.6) 

n—^oo ) ) 



Since all the terms on the right-hand side of (7.12.6) converge to zero as n ^ 
(X) due to the definition of the spectral (conditional) inf-mutual information 
rates, we obtain 



r{Ri - 7, i ^2 - 7 |Xi, X 2 ) = 0 . (7.12.7) 

Equation (7.12.7) means 

(i^l,i? 2 ) € Cl({(i^i,i^ 2 )|i^l > 0,i?2 > 0, J*(i?i,i^ 2 |Xi,X 2 ) < 1}) 
because 7 > 0 is arbitrary. Hence, we have 
(i7l,i72)G7^;(v(Xl,X2) 

from (7.12.2). □ 



Proof of Theorem 7,12.1. 

1 ) Sufficiency: 

Fix a rate pair (i^i,i^ 2 ) satisfying (i^i,i? 2 ) ^ C(W) arbitrarily and con- 
sider an arbitrary (n, Mn\Mn\sn)-code satisfying 

liminf — logM^^^ > Ri, (7.12.8) 

n— >-oo n 

lim inf — log > R 2 . (7.12.9) 

n— >00 rt 

Let us define a pair of input processes 
(Xi = {xr}~ 1 ,X2 = 1) e 5/ 

on this code as is defined in Lemma 7.10.2 and denote by Y = the 

channel output corresponding to (Xi, X 2 ). Since the capacity region C(W) is 
closed, we can choose a sufficiently small 7 > 0 satisfying (i^i — 27 , R 2 ~ 2 j) ^ 



516 7 Multi- Terminal Information Theory 



C(W) for a given (Ri,R 2 ) i C(W). Then, Theorem 7.7.1 and (7.12.3) imply 
J*{Rx - 2-1, i ?2 - 27 |Xi, X 2 ) = 1, that is, 



hmmf Pr.^ - log — (vn\vn\ 



or — log 
n 



TT"(y"|Xf,X^) 



< i?l -I- i?2 - 47 > = 1. 



Py-(r«) 

On the other hand, since (7.12.8) and (7.12.9) yield 
- log > J ?1 - 7 (Vn > no), 



(7.12.10) 



n 



— \ogM^'> > R 2 -I (Vn > no), 
n 



by substituting these inequalities into (7.10.26) in Lemma 7.10.2, we obtain 

f 1 W^iY^\X^,X^) ^ 

£n > Pr< - log -5 (Vn\Yn\ ^ -^1 “ 27 

[n Pyn|x"(y"|X^) 

1 lT»(y«|Xf,X?) „ „ 



or — log 
n 



W^(Y^\X^,X^) 

PyAY-) 



^ R\ -(- R 2 — 3y / — 3e 



—n'y 



> Pr - 



1 W^(Y^\X^,X^) ^ ^ 

n PYr'\xAy^\^2) - ^ 



or-log „^ "( <i? 2-27 



n " Py„|x;.(T”|Xf) 

iT”(y"|Xf,x^) 



or — log 
n 



PyAYA 



<Ri + R2-A-f}- 



(7.12.11) 

By taking lim inf of both sides of (7.12.11) and recalling (7.12.10), lim £n = 1 

n— >oo n-^oo 

follows. 

2) Necessity: 

Since (7.12.5) always holds in view of Remark 7.12.1, it suffices to develop 
IJ Pw(Xi,X2)D (J P^(Xi,X2). (7.12.12) 

(Xi,X2)65/ (Xi,X2)€5/ 



7.12 Strong Converse Theorem for Multiple- Access Channels 



517 



To this end, for an arbitrary rate pair (Xi,X2) G Sj we arbitrarily choose a 
rate pair satisfying 

(Ri,R2) e C\{{{RuR2)\Ri>0,R2>Q,r{Ri,R2\Xi,X2) < 1 }). 

Then, we have 

-7,i^2 -7 |Xi,X2) < 1, (7.12.13) 

where 7 > 0 is an arbitrary constant. Setting 

M^i) = 

M^2) = gn(fi2-27)^ 



Lemma 7.10.1 guarantees the existence of an {n, Mn\ £n)-code satis- 
fying 

£n < Pr< - log <Ri-7 



or — log 



n ^ Pyr.^x^iYn\X^) 






< i?2 - 7 



(7.12.14) 

By taking liminf of the both hands sides of (7.12.14), it follows from (7.12.13) 

n— >oo 

that 



liminf < J*{Ri - 7, -^2 - 7|Xi,X2) < 1. (7.12.15) 

n^oo 

Now, let us assume that the channel satisfies the strong converse property. If 
(Ri — 27, R 2 ~ 2 j) ^ C(W), lim Sn — I must hold. However, this contradicts 

n^oo 

(7.12.15). Hence, {Ri — 27,i^2 — 27) G C(W) must be satisfied. This means 
that (Ri,R 2 ) G C(W) because C(W) is closed and 7 > 0 is arbitrary. Then, 
Theorem 7.7.1 implies 

{Ri,R 2 )€ IJ Pw(Xi,X 2), 

(Xi,X2)G5/ 

which establishes (7.12.12). □ 



518 7 Multi- Terminal Information Theory 



Remark 7.12.2. If a multiple- access channel W satisfies the strong converse 
property, the e-capacity region of W does not depend on e, i.e., 

C(elW)=C(W) (0<Ve<l). 

Hence, the multiple- access channel W given in Example 7.11.1 does not sat- 
isfies the strong converse property. □ 



Example 7.12.1. Let us consider the additive multiple- access channel W 
given in Example 7.7.1 again. If the noise process Z = is stationary 

and memory less, then W satisfies the strong converse property. This property 
can be developed by considering the input pair 

(Xi = {xn~ i,x2 = i) 

subject to the uniform distributions over Aff and respectively, in Theo- 
rem 7.12.1. □ 



Now, denoting by Y = the output of a multiple-access chan- 
nel W = with two arbitrary inputs Xi = and X 2 = 

{X 2 }^i satisfying (Xi,X2) G <S/, we define 



n-^00 n 


(7.12.16) 


7(X.YlX.,.p.,^s„pi.o,^”'^”lf.:^!) 

n— >00 n \^ l ) 


(7.12.17) 


/(X.X.;Y) = p-l™sup_^log , 


(7.12.18) 


which are called the spectral (conditional) sup-mutual information rates of 
W. If we define 7^w(Xi,X2) as the collection of all (^ 1 ,^ 2 ) satisfying 


0<i?i <7 (Xi;Y|X2), 


(7.12.19) 


0<i?2<7(X2;Y|Xi), 


(7.12.20) 


ili + i?2 <7 (XiX2;Y), 


(7.12.21) 



we have the following corollary on the strong converse property. 

Corollary 7.12.1. If 

U 7ew(Xi,X2)= y ^w(Xi,X2) (7.12.22) 

(Xi,X2)eSj (XuX2)eSi 

for a multiple- access channel W, then W satisfies the strong converse prop- 
erty, where 7^w(Xi,X2) is the region defined in Theorem 7.7.1 in §7.7. 



7.12 Strong Converse Theorem for Multiple- Access Channels 



519 



Proof. In view of Theorem 7.12.1 and Remark 7.12.1, it suffices to develop 
y 7^W(Xl,X2)C U ^w(Xi,X 2 ). (7.12.23) 

(Xi,X2)G«Sj (Xi,X2)G5/ 

To this end, for an arbitrary (Xi,X2) G Sj define (^1,^2) as an arbitrary 
rate pair belonging to 



Cl {{{RuR 2 )\Ri > 0, R2 > 0, J*(Ri, R2IX1, X2) < 1}) . 
Then, it follows that J*{R\ — 7, R2 ~ 7|Xi,X2) < 1, i.e.. 



lim inf Pr 

n-^oo 



<-10g O ^ <^^-1 






n 






1 

or - log — fvnivn\ ^ -^2 - 7 



or — log 
n 






< i?l + i?2 - 27 > < 1, 



where 7 > 0 is an arbitrary constant. Therefore, we obtain 



liminfPr|llog^"(^"lf;f)<i^,- 



y-|X' 



(Y-\X^) 



7K1, 



lim inf Pr 

n— J-oo 



1 

n Pyn|x;.(r"|Xf) 



< P2 - 7 



< 1 , 



lim inf Pr 

n— KX5 







Py-(P”) 



< Pi + P2 — 27 



} 



< 1. 



Then, by using the definition of the spectral (conditional) sup-mutual infor- 
mation rates, we have 

Pi-7</(Xi;Y|X2), 

P2-7<7(X2;Y|Xi), 

Pi + P 2 - 27 < J(XiX2;Y). 

Since 7 > 0 is arbitrary, it holds that 

Pi <7 (Xi;Y|X2), 

P 2 </(X 2 ;Y|Xi), 

P 1 +P 2 <7(XiX2;Y,) 

which establishes (Pi,P2) € Pw(Xi,X2). □ 



520 7 Multi- Terminal Information Theory 

7.13 Multiple- Access Channels with Cost Constraint 

In this section we consider coding of multiple- access channels with input cost 
constraint. This formulation can be regarded as a generalization of coding of 
single-user channels with input cost constraint treated in §3.6 in Chapter 3. 
First, we define cost functions — > R and ^ 

the encoders 1 and 2 of a multiple-access channel W, respectively, where 
R denotes the set of all real numbers. If a code (Ci,C 2 ) such as 



Cl = |ui,...,u^(i)| C Afi", 


C2 = |vi,.. 




satisfies the cost constraint 






l4”4ux)<A (i = i,.. 


-,mW), 


(7.13.1) 


l4”4v,)<r2 (i = i,. 




(7.13.2) 



the code (Ci,C 2 ) is called a (Fi,/ 2 )-code for the multiple-access channel 
W. We call the collection of all rate pairs (Ri,R 2 ) that is achievable by 
an {n, Mn\ Mn\sn)-code restricted to the class of (Fi, / 2 )-codes for all 
n = 1, 2, • • • the (Fi, / 2 )-capacity region of W and denote it by Cri,r 2 (W). 

In order to obtain a formula giving the (Fi, / 2 )-capacity region, denote 
by <5ri,r2 the pairs of input processes 

(Xi,X2) = {(Xr,X2")}^^iG<S, 
satisfying 

< a| = Pr|l4"^(XJ) < Tsj = 1 (7.13.3) 

for all n = 1, 2, • • •. Letting 7^w(Xi, X 2 ) be the rate region defined in Theo- 
rem 7.7.1, we have the following theorem: 

Theorem 7.13.1 (Han [36]). Let A'l and X 2 he arbitrary input alphabets and 
y an arbitrary output alphabet (CTi, and y can be arbitrary (not restricted 
to finite) sets). Then, the (Ti, F 2 )- capacity region <^ri,r 2 (W) of a multiple- 
access channel W is given by 

Crx,r.(W)= U 7^w(Xl,X2). (7.13.4) 

(Xi,X2)€<Sn,r2 

Proof We can prove this theorem by using the argument given in the proof 
of Theorem 7.7.1, noting cost constraint given in (7.13.1) and (7.13.2). □ 

Here, we define the strong converse property of a multiple-access channel 
with cost constraint as the property that in Definition 7.12.1 C(W) is replaced 
with Cri,r 2 (W) and {n, Mn\ Mn\ Sn)-codes are restricted to (Ti, l 2 )-codes. 
Then, we have the following strong converse theorem, which can be easily 
developed similarly to Theorem 7.12.1. 



7.13 Multiple- Access Channels with Cost Constraint 521 



Theorem 7.13.2 (Strong converse theorem). A multiple- access channel 
W with cost constraint satisfies the strong converse property if and only if 

U 7 ew(Xi,X 2 )= U 7 ew(Xi,X 2 ), (7.13.5) 

(Xi,X2)€«Sri,r2 (Xi,X2)G<Sri ,T2 

where 7^w(Xi, X 2 ) and X 2 ) are the regions defined in Theorem 7.7.1 
in ^7.7 and (7.12.2), respectively. □ 



Next, let us consider a special case that the cost functions c^i"^ and 
are expressed as 

n 

^1 (^ 1 ) ^ ^ Cl Xi (^llj • • • 5 ^In) C , 

i=l 

n 

C^^\yi2) = ^C2{x2i), X2 = (X21, . • . , X2n) C 
i=l 

by using functions ci : A*! — > R and 02 :^ 2 ^ R, respectively. The cost 
functions with such properties are called additive. 

By using the formula (7.13.4) in Theorem 7.13.1, we can express the 
(Ti, / 2 )-capacity region of a stationary memoryless multiple-access channel 
W == {W} with additive cost constraint in a “computable” form. Denoting by 
j7M-ri,r2 the collection of all triples (Xi, X 2 , Q) of random variables satisfying 
Eci(Xi) < Fi and £ 02 (^ 2 ) < i ~2 with the property that Xi and X 2 are 
independent given Q, we have the following theorem: 

Theorem 7.13.3 (Han [36]). Let Ai and A 2 be arbitrary input alphabets and 
y an arbitrary output alphabet (CAi, A 2 and y can be arbitrary (not restricted 
to finite) sets). Then, the {Ti, T 2 )- capacity region Cri,r 2 (y^) ^ stationary 
memoryless multiple- access channel W additive cost constraint 

is given by 

CA,r.(W) = Cl[ U TIw{X^,X 2 \Q)Y (7.13.6) 

Y(Xi,X2,Q)6yM;ri,r2 / 

where 7^iy(Xi, X 2 IQ) is the region defined in Theorem 7.8.1 and the cardi- 
nality of Q can be restricted to at most 5. □ 



Remark 7.13.1. The set on the right-hand side of (7.13.6) is convex for 

each(ri,/2)- n 



522 7 Multi- Terminal Information Theory 



Proof of Theorem 7.13.3. 

If we apply an argument parallel to the proof of the converse part of 
Theorem 7.8.1, it is easily verified that 

7^w(Xl,X2)cCl j U Tlw{Xi,X2\Q) 

\(Xi,X2,Q)EJM;ri,r2 

for arbitrary (Xi,X2) G <5ri,r2- Hence, we hereafter develop the relationship 
opposite to the relationship above. To this end, we first notice that 

^A,r2 = U (7.13.7) 

(Xi,X2,Q)€j7’M;ri,r2 

is “continuous” with respect to (1^1,12) in the sense that for an arbitrary 
(i^i, JR2) ^ ^A,r2 there exists an {R'i^R'2) G 7^n-5,A-5 satisfying 

R'l > Ri 

R2 ^ R2 ~ /^(^)? 

where ii{5) 0 as 5 0. This property can be proved by using the fact that 

the two functions 

Gi{ri,r 2 \Ri) = sup {R 2 \{Rit R2) ^ ^a,a} 5 

G2{Pi,P2\R2) = sup{i?i|(i^i,i?2) ^^A,aI 

are concave with respect to (Ti, 12) for arbitrarily fixed R\ > 0 and R 2 > 0. 
Furthermore, we can easily verify that 

\J 7ew(Xi,X2) 

(Xi,X2)€5n,r2 

is a closed set (cf. Remark 5.7.5 in Chapter 5). Hence, in order to prove 
the theorem, it suffices to show that for an arbitrarily small 5 > 0 and 
(Xi,X 2 ,Q) G v7m;A- 5,A-(5 it holds that 

7^vv(Xl,X2|g)c7^w(Xl,X2) 

for some (Xi,X2) G First, for an arbitrarily given (Xi,X2,Q) G 

J7m;A- 5,A-5 define the triplet 

(Xi = 1 , X 2 = {xnZi - Y = {y"}~ 1 ) 

of two input processes and one output process as is defined in the proof of 
the direct part of Theorem 7.8.1. Then, we have (7.8.10)-(7.8.12), that is, 

I(Xi; YIX 2 ) = 7(Xi; FIX 2 Q), (7.13.8) 

/(X 2 ; Y|Xi) = I{X 2 -,Y\XiQ), (7.13.9) 

/(X 1 X 2 ; Y) = /(X 1 X 2 ; y IQ), (7.13.10) 



7.13 Multiple- Access Channels with Cost Constraint 523 



where V denotes the channel output corresponding to the input pair (Xi, X 2 ). 
However, the input processes Xi and X 2 defined in this way do not always 
satisfy the cost constraint (7.13.3). Thus, we construct other input processes 
Xi and X 2 satisfying (7.13.3) in the following way. First, we note that, since 
Xi = {x[^\ . . . ,x[^) and X 2 = (X 2 i\ . . . , X^^^) are independent and 
satisfy 

lim 1 VEci(x{”)) = Eci(Xi) < A - (5, 

n—^oo n 



lim 1 Y"Ec 2 (x(”^) = Ec 2 (X 2 ) < T 2 - <5, 

n— >cx3 71 

Khintchin’s law of large numbers tells us that 



as n 00 , 



as n 00 . 



Next, we set 



xi G xr 



Tn = X2 G Xo" 



i=l ) 

l^C2(X2i) < E2 I 
2=1 ) 



(7.13.11) 

(7.13.12) 

(7.13.13) 

(7.13.14) 



and define the pair of input processes (Xi = {X^}^^,X 2 = by 



Py.(x.) = I 

0 for Xi ^ Sn, 



P-X-M 2 ) = 



^-Pxj(x 2 ) for X 2 €T„, 
0 for X 2 ^ Tn> 



Clearly, (Xi, X 2 ) G <Sri,r 2 - Now, denoting by W = a given channel 

and Y = the output of the channel W corresponding to (Xi,X 2 ), 

we have 



524 7 Multi- Terminal Information Theory 



-Py"|xy(y|x2) = W^”(y|xi,X2)Px^(xi) 

> W”(y|xi,X2)PXi"(xi) 

XlGS'ri 

= Oin Y2 ^"(ylXl>^2)Px"(^l) 

xiG5„ 

= a:„Pyn|^.(y|x2) 

for all X2 G ^2 y G 3^^, which leads to 

1 1 1 / 1 1 

- log < ~ log ■ 



+ -log—. (7.13.15) 



Tl Oif) 



Therefore, for an arbitrary 7 satisfying 7 < 7(Xi; YIX2), it holds that 



-f-sse-’i 

On the other hand, since we have 



(7.13.16) 



fi 1 

\n ® Py.|x;(V“|XJ) 

- \n ^ Py.,x,(rm ^ 7 ' 

by substituting this inequality into (7.13.16) we obtain 

fi 

\n ® PK-|;tf(r"|XJ) 

I" fV-|X;(5' IX,) n a„J 

If we notice that ^ 1 and ^ 1 as n ^ 00 and 7 < 7(Xi; YIX2), this 
inequality implies 



, ^ ) 1 , W^(Y^\X^,T2) 1 , 1 1 

lim Pr I - log — < 7 - - log — [ = 0. 

™ [n p_„|_„(y 1X2) n a„j 



(7.13.17) 



Since 7 is arbitrary as far as it satisfies 7 < /(Xi; YIX2), (7.13.17) means 
/(Xi;Y|X 2 )</(Xi;Y|X 2 ). 



This leads to 



7.13 Multiple- Access Channels with Cost Constraint 525 



/(Xi;y|X2Q)<i(Xi;Y|X2) 



(7.13.18) 



due to (7.13.8). Similarly, we can also obtain 



7(X2;y|XiQ)</(X2^|Xi), 

7(XiX2;y|Q) <Z(XiX2;Y). 



(7.13.19) 

(7.13.20) 



We now have 



7^H^(Xl,X2|Q)c7^w(Xl,X2) 



because (Xi,X 2 ) G Spi,r 2 is clear from its construction. 



□ 



Let us apply Theorem 7.13.3 to the following additive white Gaussian 
noise (AWGN) stationary memoryless channel W with additive cost con- 
straint (see §3.7 in Chapter 3). Suppose that signal powers Pi and P 2 are 
assigned to encoders 1 and 2 of the channel W as cost constraint and the 
channel output Y corresponding to a input pair (Xi,X 2 ) is given by 

y = Xi + X2 + z, 

where the Gaussian noise Z with noise power N is independent of the input 
pair (Xi,X 2 ). Then, the maximum entropy theorem (see §3.7 in Chapter 3) 
and Theorem 7.13.3 immediately yield the following well-known result. 

Corollary 7.13.1 (Wyner [103]). The capacity region Cp^^p^(W) of the 
AWGN multiple- access channel W with signal power (Pi, P 2 ) CL^d noise power 
N is given by 



At the end of this section, let us consider the capacity region of the mixed 
channel W of stationary memory less multiple- access channels with additive 
cost constraint, which corresponds to Theorem 7.9.2. By using the same ar- 
gument in the proof of Theorem 7.13.3, it turns out that the (Pi, P 2 )-capacity 
region of such a channel can be expressed as follows: 

Theorem 7.13.4 (Han [36]). Let and X 2 be arbitrary input alphabets and 
y an arbitrary output alphabet (^ 1 , A '2 0 ,'^d y are not necessarily finite sets). 
Denote byW the mixed channel of two stationary memoryless multiple- access 
channels Wi = {Wi} and W 2 = {W 2 }. Then, the {Pi, P 2 )- capacity region 
Cri,r 2 (W) of the mixed channel 'W with additive cost constraint is given by 



Cp„P,(W) = 





(7.13.21) 



526 7 Multi- Terminal Information Theory 



CA,r,(W) 



= Cl 




(Xi,X2,Q)€»7M;ri,r2 



n‘'^l^^{XuX2\Q) 



(7.13.22) 



where 'R-wlw 2 {Xi,X 2 \Q) is the region defin 
dinality of Q can be restricted to at most 8. 



(Xi,X 2 \Q) is the region defined in Theorem 7.9.2 and the car- 



□ 



The combination of Theorem 7.13.4 and Corollary 7.13.1 immediately 
yields the following result: 

Corollary 7.13.2. Let Wi and W 2 be the AWGN stationary memory less 
multiple- access channels with noise powers and respectively. Then, 
the capacity region Cp^^p^i^) of the mixed channel W ofW\ and W 2 with 
signal power (Pi,P 2 ) (cost constraint) is given by 



Remark 7.13.2. The mixed channel W of the AWGN stationary memory- 
less multiple- access channels given in Corollary 7.13.2 does not satisfy the 




(7.13.23) 



where iV^ax = 



max 



□ 



strong converse property. 



□ 



References 



1. N. Abramson. Information Theory and Coding. McGrraw-Hill, New York, 
1963. 

2. R. Ahlswede. The weak capacity of averaged channels. Wahrscheinlichkeits- 
theorie und verw. Geh., 11:61-73, 1968. 

3. R. Ahlswede. Multi-way communication channels. In Proceedings of 2nd In- 
ternational Symposium on Information Theory, Tsahkadsor, Armenia: Pub- 
lishing House of the Hungarian Academy of Sciences, pages 23-52, 1971. 

4. R. Ahlswede and G. Dueck. Identification via channels. IEEE Transactions 
on Information Theory, IT-35(1): 15-29, 1989. 

5. S. Arimoto. On the converse to the coding theorem for discrete memoryless 
channels. IEEE Transactions on Information Theory, IT-19:357-359, 1973. 

6. R. B. Ash. Information Theory. Dover Publications, New York, 1965. 

7. A. R. Barron. The strong ergodic theorem for densities: generalized Shannon- 
McMillan-Breiman theorem. Annals of Probability, 13(4):1292-1303, 1985. 

8. P. Billingsley. Ergodic Theory and Information. John Wiley Sz Sons, New 
York, 1965. 

9. P. Billingsley. Probability and Measure, 3rd ed. John Wiley & Sons, New 
York, 1995. 

10. D. Blackwell, L. Breiman, and A. J. Thomasian. The capacity of a class of 
channels. Annals of Statistics, 30(4):1229-1241, 1959. 

11. R. E. Blahut. Principles and Practice of Information Theory. Addison- Wesley, 
Massachusetts, 1988. 

12. J. A. Bucklew. Large Deviation Techniques in Decision, Simulation and Esti- 
mation. John Wiley & Sons, New York, 1990. 

13. M. V. Burnashev. On Identification capacity of infinite alphabets or 
continuous-time c shannels. IEEE Transactions on Information Theory, IT- 
46:2407-2414, 2000. 

14. P. N. Chen. General formulas for the Neyman-Pearson type-H error exponent 
subject to fixed and exponential type-I error bound. IEEE Transactions on 
Information Theory, IT-42(l):316-323, 1996. 

15. P. N. Chen and F. Alajaji. Optimistic Shannon coding theorems for arbitrary 
single-user systems. IEEE Transactions on Information Theory, IT-45: 2623- 
2629, 1999. 

16. T. M. Cover. A proof of the data compression theorem of Slepian and Wolf for 
ergodic sources. IEEE Transactions on Information Theory, IT-22:226-228, 
1975. 

17. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, New 
York, 1991. 



528 References 



18. I. Csiszar. Information theory and ergodic theory. Problems of Control and 
Information Theory^ 16(l):3-27, 1987. 

19. I. Csiszar and J. Korner. Information Theory: Coding Theorems for Discrete 
Memoryless Systems. Academic Press, New York, 1981. 

20. I. Csiszar and G. Longo. On the error exponent for source coding and for 
testing simple statistical hypotheses. Studia Sci. Math. Hungar., 6:181-191, 
1971. 

21. L. D. Davisson, G. Longo, and A. Sgarro. The error exponent for the noiseless 
encoding of finite ergodic Markov sources. IEEE Transactions on Information 
Theory, IT-27(4):431-438, 1981. 

22. A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications. 
Jones and Bartlett Publishers, Boston, 1993. 

23. R. L. Dobrushin. A general formulation of the fundamental Shannon theo- 
rem in information theory. Uspehi Mat. Acad. Nauk. SSSR, 40:3-104, 1959. 
Translation in Transactions of American Mathematical Society, Series 2, 33: 
323-438, 1963. 

24. H. G. Eggleston. Convexity. Cambridge University Press, Cambridge, 1958. 

25. P. Elias. The efficient construction of an unbiased random sequence. Annals 
of Mathematical Statistics, 43:865-870, 1972. 

26. R. M. Fano. Class notes for transmission of information, Course 6.574. MIT. 

27. R. M. Fano. Transmission of Information. MIT Press, Cambridge, Mass, and 
Wiley, New York, 1961. 

28. A. Feinstein. A new basic theorem of information theory. IRE Trans. PGIT, 
4:2-22, 1954. 

29. W. Feller. An Introduction to Probability Theory and Its Applications. John 
Wiley 8z Sons, New York, 1957. 

30. R. G. Gallager. Information Theory and Reliable Communication. John Wiley 
& Sons, New York, 1968. 

31. R. M. Gray. Entropy and Information Theory. Springer- Verlag, New York, 
1990. 

32. R. M. Gray and L. D. Davisson. Source coding without the ergodic assump- 
tion. IEEE Transactions on Information Theory, IT-20:502-516, 1974. 

33. R. M. Gray and L. D. Davisson. The ergodic decomposition of stationary 
discrete random processes. IEEE Transactions on Information Theory, IT- 
20(5):625-636, 1974. 

34. R. M. Gray and D. S. Ornstein. Block coding for discrete stationary d- 
continuous noisy channels. IEEE Transactions on Information Theory, IT- 
25(3):292-306, 1979. 

35. T. S. Han. An information-spectrum approach to source coding theorems with 
a fidelity criterion. IEEE Transactions on Information Theory, IT-43(4):1145- 
1164, 1997. 

36. T. S. Han. An information-spectrum approach to capacity theorems for the 
general multiple- access channel. IEEE Transactions on Information Theory, 
IT-44(7):2773-2795, 1998. 

37. T. S. Han. Basic considerations on large deviation theorems. IS Technical 
Reports UEC-IS- 1998-4, Graduate School of Information Systems, University 
of Electro- Gommunicat ions, Chofu, Tokyo, 182-8585, Japan, October 1998. 
(in Japanese). 

38. T. S. Han. Hypothesis testing with the general source . IEEE Transactions 
on Information Theory, IT-46: 241 5-2427, 2000. 



References 529 



39. T. S. Han. The reliability functions of the general source with fixed-length 
coding . IEEE Transactions on Information Theory, IT-46:21 17-2132, 2000. 

40. T. S. Han. Theorems on the variable-length intrinsic randomness . IEEE 
Transactions on Information Theory, 1T-46:2108-2116, 2000. 

41. T. S. Han. Weak variable-length source coding. IEEE Transactions on In- 
formation Theory, lT-46: 1217-1226, 2000. 

42. T. S. Han and M. Hoshi. Interval algorithm for random number generation. 
IEEE Transactions on Information Theory, lT-43(2):599-611, 1997. 

43. T. S. Han and K. Kobayashi. The strong converse theorem for hypothesis 
testing. IEEE Transactions on Information Theory, 1T-35(1):178-180, 1989. 

44. T. S. Han and O. Uchida. Source code with cost as a nonuniform random 
number generator. IEEE Transactions on Information Theory, 1T-46:712- 
717, 2000. 

45. T. S. Han and S. Verdu. New results in the theory of identification via chan- 
nels. IEEE Transactions on Information Theory, lT-38(l):14-25, 1992. 

46. T. S. Han and S. Verdii. Approximation theory of output statistics. IEEE 
Transactions on Information Theory, lT-39(3):752-772, 1993. 

47. T. S. Han and S. Verdu. Spectrum invariancy under output approximation 
for full-rank discrete memoryless channels. Prohlemy Peredachi Informatsii, 
29(2):9-27, 1993. (in Russian). 

48. T. Hashimoto. Source coding for average rate and average distortion: New 
variable-length coding theorems. IEEE Transactions on Information Theory, 
lT-29(3):785-792, 1983. Correction to “Source coding for average rate and 
average distortion: New variable- length coding theorems,. 

49. F. S. Hill, Jr. and M. A. Blanco. Random geometric series and intersymbol 
interference. IEEE Transactions on Information Theory, lT-19(3):326-335, 
1973. 

50. W. Hoeffding. Asymptotically optimal test for multinomial distributions. An- 
nals of Mathematical Statistics, 36:369-400, 1965. 

51. G. D. Hu. On shannon theorem and its converse for sequences of communi- 
cation schemes in the case of abstract random variables. In Proceedings of 
the 3rd Prague Conference on Information Theory, Statistics, Decision Func- 
tions, Random Processes, pages 285-333, Czechoslovak Academy of Sciences, 
Prague: 1964. 

52. D. A. Huffman. A method for the construction of minimum redundancy codes. 
Proceedings of IRE, 40:1098-1101, 1952. 

53. S. lhara. Information Theory for Continuous Systems. World Scientific, New 
Jersey, 1993. 

54. K. Iriyama. Probability of error for fixed-Length source coding of general 
sources. IEEE Transactions on Information Theory, IT-47: 1537-1543, 2001. 

55. F. Jelinek. Probabilistic Information Theory. McGraw-Hill, New York, 1968. 

56. J. C. Kieffer. A general formula for the capacity of stationary nonanticipatory 
channels. Information and Control, 26:381-391, 1974. 

57. J. C. Kieffer. Finite-state adaptive block to variable-length noiseless coding 
of a nonstationary information source. IEEE Transactions on Information 
Theory, IT-35:1259-1263, 1989. 

58. D. Knuth and A. Yao. The complexity of nonuniform random number gener- 
ation. In J. F. Traub, editor. Algorithms and Complexity, New Directions and 
Results, pages 357-428. Academic Press, New York, 1976. 



530 References 



59. L. G. Kraft. A device for quantizing, grouping, and coding amplitude- 
modulated pulses. Master’s thesis. Dept. Electrical Engineering, MIT, Cam- 
bridge, Massachusetts, 1949. 

60. K. Kurosawa and T. Yoshida. Universal hashing and identification codes via 
channels. IEEE Transactions on Information Theory^ IT-45(6):2091-2095, 
1999. 

61. A. Leon-Garcia, L. D. Davisson, and D. L. Neuhoff. New results on coding 
of stationary nonergodic sources. IEEE Transactions on Information Theory, 
IT-25(2):137-144, 1979. 

62. G. Longo and A. Sgarro. The source coding revisited: a combinatorial ap- 
proach. IEEE Transactions on Information Theory, IT- 25 *.544-548, 1979. 

63. K. M. Mackenthun and M.B. Pursley. Strongly and weakly universal source 
coding. In Proceedings of the 1977 Conference on Information Science and 
Systems, pages 286-291, Johns Hopkins University, 1977. 

64. R. J. McEliece. The Theory of Information and Coding. Addison- Wesley, 
Reading, MA, 1977. 

65. B. McMillan. Two inequalities implied by unique decipherability. IRE Trans- 
actions on Information Theory, IT-2: 115-1 16, 1956. 

66. N. Merhav. Universal coding with minimum probability of codeword length 
overflow. IEEE Transactions on Information Theory, IT-37(3):556-563, 1991. 

67. S. Miyake and F. Kanaya. Coding theorems on correlated general sources. lE- 
ICE Transactions on Fundamentals, E78-A(9): 1063-1070, Sepetember 1995. 

68. H. Nagaoka. Seminar notes, Graduate School of Information Systems, Uni- 
versity of Electro- Communications, Tokyo, 1996. 

69. H. Nagaoka and M. Hayashi. An information-spectrum approach to classical 
and quantum hypothesis testing. IS Technical Reports UEC-IS-2000-5, The 
University of Electro- Communications, 2000. 

70. H. Nagaoka and S. Miyake. Approximation of stochastic processes and in- 
formation spectra. In Proceedings of 1 9th Symposium on Information Theory 
and its Applications,, pages 117-120, Hakone, Japan: 1996. 

71. K. Nakagawa and F. Kanaya. On the converse theorem in statistical hypothesis 
testing for Markov chains. IEEE Transactions on Information Theory, IT- 
39(2):629-633, 1993. 

72. S. Natarajan. Large deviations, hypotheses testing, and source coding for finite 
Markov chains. IEEE Transactions on Information Theory, IT-31(3):360-365, 
1985. 

73. J. Nedoma. The capacity of a discrete channel. In Proceedings of the 1st Prague 
Conference on Information Theory, Statistics, Decision Functions, Random 
Processes, pages 143-181, Czechoslovak Academy of Sciences, Prague, 1957. 

74. J. Neyman and E. S. Pearson. On the problem of the most efficient tests of 
statistical hypotheses. Phil. Trans. Royal Soc. London, Series A, 231:289-337, 
1933. 

75. K. R. Parthasarathy. Effective entropy rate and transmission of information 
through channels with additive random noise. Sankhya, A25(l):75-84, 1963. 

76. M. S. Pinsker. Information and Information Stability of Random Variables 
and Processes. Holden-Day, San Francisco, 1964. 

77. C. E. Shannon. A mathematical theory of communication. Bell System Tech- 
nical Journal, 27:379-423, 623-656, 1948. 

78. C. E. Shannon. Certain results in coding theory for noisy channels. Informa- 
tion and Control, 1:6-25, 1957. 



References 531 



79. C. E. Shannon. Coding theorems for a discrete source with a fidelity criterion. 
IRE Nat. Conv. Rec., part 4:142-163, 1959. 

80. C. E. Shannon. Probability of error for optimal codes in a Gaussian channel. 
Bell System Technical Journal, 38:611-656, 1959. 

81. P. C. Shields. The Ergodic Theory of Discrete Sample Paths. American Math- 
ematical Society, Province, Rhode Island, 1996. 

82. P. C. Shields, D. L. Neuhoff, L. D. Davisson, and F. Ledrappier. The 
distortion-rate function for nonergodic sources. Annals of Probability, 
6(1):138-143, 1978. 

83. D. Slepian and J. K. Wolf. Noiseless coding of correlated information sources. 
IEEE Transactions on Information Theory, IT-19:471-480, 1973. 

84. Y. Steinberg. New converses in the theory of identification via channels. IEEE 
Transactions on Information Theory, IT-44(3):984-998, May 1998. 

85. Y. Steinberg and S. Verdu. Simulation of random processes and rate-distortion 
theory. IEEE Transactions on Information Theory, IT-42(l):63-86, 1996. 

86. V. Strassen. The existence of probability measures with given marginals. 
Annals of Mathematical Statistics, 36:423-439, 1965. 

87. O. Uchida and T. S. Han. The optimal overfiow and underflow probabili- 
ties of variable-length coding for the general source. lEICE Transactions on 
Fundamentals, E84-A:2457-2465, 2001. 

88. S. Vembu and S. Verdii. Generating random bits from an arbitrary source: fun- 
damental limits. IEEE Transactions on Information Theory, IT-41(5):1322- 
1332, 1995. 

89. S. Vembu, S. Verdu, and Y. Steinberg. The source-channel separation theorem 
revisited. IEEE Transactions on Information Theory, IT-41 (l):44-54, 1995. 

90. S. Verdii. Private communication, 1993. 

91. S. Verdii and T. S. Han. A general formula for channel capacity. IEEE 
Transactions on Information Theory, IT-40(4): 1147-1 157, 1994. 

92. S. Verdii and T. S. Han. The role of the asymptotic equipartition property 
in noiseless source coding. IEEE Transactions on Information Theory, IT- 
43(3):847-857, 1997. 

93. S. Verdii and V. K. Wei. Explicit construction of optimal const ant- weight 
codes for identification via channels. IEEE Transactions on Information The- 
ory, IT-39(l):30-36, 1993. 

94. K. Visweswariah, S. Kulkarni, and S. Verdii. Source codes as random num- 
ber generators. IEEE Transactions on Information Theory, IT-44(2):462-471, 
1998. 

95. K. Winkelbauer. Communication channels with finite past history. Transac- 
tions of the Second Prague Conference on Information Theory, Prague, pages 
685-831, 1960. 

96. K. Winkelbauer. On the asymptotic rate of non-ergodic information sources. 
Kybernetika, 6:127-148, 1970. 

97. K. Winkelbauer. On the coding theorem for decomposable discrete informa- 
tion channels I. Kybernetika, 7(2): 109-123, 1971. 

98. K. Winkelbauer. On the coding theorem for decomposable discrete informa- 
tion channels II. Kybernetika, 7(2):230-255, 1971. 

99. K. Winkelbauer. On the regulariry condition for decomposable communication 
channels. Kybernetika, 7(2):314-327, 1971. 

100. J. Wolfowitz. The coding of messages subject to chance errors. Illinois Journal 
of Mathematics, 1:591-606, 1957. 



532 



References 



101. J. Wolfowitz. On channels without capacity. Information and Control, 6:49- 
54, 1963. 

102. J. Wolfowitz. Coding Theorems of Information Theory, 3rd ed. 3rd ed., 
Springer- Verlag, New York, 1978. 

103. A. D. Wyner. Recent results in the Shannon theory. IEEE Transactions on 
Information Theory, IT-20:2-10, 1974. 

104. H. Yamamoto. Private communication, 1998. 

105. J. Ziv. The capacity of the general time-discrete channel with finite alphabet. 
Information and Control, 14:233-251, 1969. 



Index 



acceptance region, 269 
achievable rate region, 455, 464 
achievable rate region, 472 
achievable rate region for the compound 
source, 467 

additive, 227, 333, 369, 378, 393, 521 
additive channel, 181, 218, 486 
additive cost, 227, 521 
additive cost constraint, 227, 233 
additive distortion measure, 337, 375, 
376, 379 

additive multiple-access channel, 486, 
500, 512 

additive non-white Gaussian noise 
channel, 241, 435 

additive white Gaussian noise, 525 
additive white Gaussian noise channel, 
236, 237, 425, 432, 433, 443, 444, 450 
address, 436-438, 446, 450, 451 
address set, 440 

almost-sure convergence, 18, 19, 286 
alternative hypothesis, 269 
asymptotic equipartition property, 43 
autoregressive process, 83, 96, 305, 317 
average distortion, 326, 333, 334 
average error probability, 214, 439 

binary entropy, 5, 59, 105 
binary symmetric channel, 182, 217, 
221 

binary symmetric stationary memory- 
less channel, 416, 420 
boundedness of distortion measure, 337 

capacity region, 483 
^-capacity region, 510 
chain rule of entropy, 331 
channel capacity, 170, 415, 419, 443 



e-channel capacity, 210, 218, 399, 416 
e-channel capacity of mixed channel, 
216 

F-cost channel capacity, 431 
channel capacity of compound channel, 
198 

channel coding, 170, 248, 395 
channel resolvability, 395, 396, 405, 
406, 413, 415-418, 425, 428 
channel (^-resolvability, 405 
((5, r')-channel resolvability, 423 
channel resolvability problem, 445 
channel resolvability theorem, 413 
channel resolvability with cost 
constraint, 422, 424, 429 
channel with cost constraint, 233, 420, 
421, 425 

Chebyshev’s inequality, 3, 4, 18-20, 
171, 186, 188, 235, 240, 328, 370-372, 
409, 410, 494 
Chernoff’s ^-distance, 303 
clean randomness, 120 
code, 1, 169, 326, 396, 454, 481 
code alphabet, 8, 348 
codeword, 1, 169, 326, 396, 481 
coding rate, 2, 9, 170, 322, 326, 335, 
396, 438, 455, 482 
coding system, 403 
coin random number, 106, 120, 121, 
126, 128, 142, 143 

compound alternative hypothesis, 397 
compound channel, 198, 205 
compound hypothesis testing, 278 
compound multiple access channel, 500 
compound source, 277, 466, 467 
conditional cumulative distribution 
function, 364 

conditional divergence, 72, 152, 294 



534 Index 



conditional divergence distance, 160, 
165 

conditional entropy, 5, 72 
conditional probability, 169 
consistency, 14, 489 
continuous input channel, 425 
continuous spectrum, 131, 135 
continuously mixed channel, 217 
convergence in probability, 15, 19 
converse part, 7 
converse theorem, 254, 448 
convex function, 359 
convexity, 392 
cost, 225 

(e, 7~')-cost channel capacity, 232 
T-cost channel capacity, 226, 420, 422, 
424 

(/i, P)-cost channel capacity, 444 
cost constraint, 232, 237, 421, 423-425, 
428, 429, 431, 443, 444, 449, 450, 520 
cost function, 520 

countably infinite, 19, 54, 76, 79, 93, 
302, 320 

counting measure, 320 
Cramer’s theorem, 79, 82, 302, 304, 
306, 319 

critical region, 269 
crossover probability, 182, 221 

decay exponentially, 102 
decay as a power function, 102 
decode, 396 

decoder, 8, 155, 170, 249, 326, 455, 482 
decoding, 457, 502 
decoding function, 353, 361 
decoding region, 169, 396, 397, 401, 
403, 414, 417, 429, 438, 445, 447, 
448, 450, 451, 482 
degenerate, 417 
deterministic encoder, 443 
deterministic variable- length encoder, 
367 

diagonal line argument, 56, 90, 341, 
357, 460 
direct part, 7 

direct theorem, 252, 399, 406, 413, 422, 
423, 441, 447 
dirty randomness, 120 
discrete uniform random number, 119 



distortion, 326, 333 
distortion measure, 326, 333 
/a-distortion-rate function, 334 
/m-distortion-rate function, 334 
i;a-distortion-rate function, 335 
i;7n-distortion-rate function, 335 
divergence, 12, 238, 276, 279, 426 
divergence density rate, 271, 286, 302 
divergence distance, 150, 152 
divergence rate, 277 
divergence-spectrum, 271, 277 
divergence spectrum-inf, 275 
domination, 260, 261 
double-exponential function, 395, 397, 
403, 446 

dual coding rate, 322 

Eggleston’s theorem, 493 
empirical distribution, 98 
encoder, 1, 8, 155, 169, 249, 326, 454, 
481 

encoding, 502 

encoding function, 352, 361 
entropy, 3 
£- entropy, 55 
entropy density rate, 15 
entropy rate, 19, 33, 44, 48, 164, 463, 
470 

entropy-spectrum, 15 
^-channel coding, 210 
^-hypothesis testing, 279 
equivalence of necessary conditions, 260 
equivalence of sufficient conditions, 257 
error probability, 2, 155, 164, 170, 249, 
455, 482 

error probability of the first kind, 269, 
287, 289, 308, 397, 402, 439 
error probability of the second kind, 
269, 287, 289, 308, 397, 402, 439 
essential infimum, 34, 208 
essential supremum, 33, 382 
evaluation of coding performance, 353 
evaluation of error probability, 457, 502 

Fano inequality, 5, 173, 419 
Patou’s lemma, 27, 199 
Feinstein’s lemma, 214 
Fisher information matrix, 426, 432 
fixed-length code, 7 



Index 535 



fixed- length coding, 165, 333 
fixed length coding under the average 
distortion criterion, 334 
fixed-length encoder, 160 
fixed- length encoding function, 352 
fixed-length intrinsic randomness 
problem, 141 

fixed- length source coding problem, 163 
fixed- length uniform random number, 
141, 143 

fixed-length coding under the maximum 
distortion measure, 333 
fixed-length source coding, 320 
full-rank channel, 412 
fundamental theorem, 416, 424, 431, 
443 

Gaussian distribution, 306 
Gaussian random variable, 379 
general channel, 176 
general source, 14, 110-112, 154, 333, 
340, 343, 368 

generalization of Feinstein’s lemma, 249 
generalization of Verdu-Han’s lemma, 
253 

generalized hypothesis testing problem, 
319 

generating rate, 142 
geometric distribution, 81, 83, 96 

half-space, 317 

Hamming distortion, 375, 393 
Hoeffding’s theorem, 305 
hypothesis testing for mixed sources, 
273, 302 

hypothesis testing problem, 397 

channel capacity, 417 
identification capacity, 398, 413, 
415-417, 425, 449 
(/Li, A) -identification capacity, 398 
(/i. A, F) -identification capacity, 422 
identification capacity with cost 
constraint, 424, 428 
identification channel capacity, 421 
identification code, 395, 396, 438 
identification code capacity theorem, 
413 



identification code with cost constraint, 
429 

identification coding problem with cost 
constraint, 421 

identification coding system, 403 
identification coding with cost 
constraint, 422 

(/i, A)-identification-transmission 
capacity region, 440, 443, 444, 447, 
449, 450 

identification-transmission code, 
438-440, 444-447, 449-451 
identification- transmission coding, 441 
independent, 484 
individual ergodic theorem, 377 
inf-entropy rate, 143 
inf-divergence rate, 285 
infimum achievable fixed-length coding 
rate, 2, 43, 160, 264 
£:-infimum achievable fixed-length 
coding rate, 38 

infimum achievable self-random number 
generating rate, 156 
infimum achievable variable-length 
coding rate, 9 

infimum achievable weak variable- 
length coding rate, 51 
infimum coding rate, 327 
infimum r- achievable correct probabil- 
ity exponents, 309 
infimum r- achievable fixed-length 
coding rate, 64, 86, 102 
information processing inequality, 173 
information-spectrum, 15, 97, 158, 177, 
189, 271 

information-spectrum approach, 24 
information-spectrum slicing, 66, 87, 
143, 289, 350, 360 
initial state, 73 
input alphabet, 169 
interval algorithm, 104 
intrinsic randomness, 118, 120, 121 
intrinsic randomness problem, 119, 121, 
124, 126, 132, 141, 152, 160 
(5-intrinsic randomness problem, 132, 
141 

invariancy, 97 
invariant theory, 100 



536 Index 



joint source-channel coding, 247, 248 
joint type, 201, 470 

K-3iy interval, 10 
Khintchin, 18 

Khint chin’s law of large numbers, 19, 
188, 194, 228, 229, 281, 285, 475, 489 
Kraft inequality, 12 
Kuhn- Tucker theorem, 419 

large deviation, 63, 85, 286, 308 
large deviation theory, 63 
law of large numbers, 3, 18 
Levy distance, 114 
likelihood-ratio density rate, 271 
limit inferior in probability, 14 
limit superior in probability, 14 
log-sum inequality, 12, 140 

M-type, 118 

Markov chain, 193, 208, 228, 249, 258 
Markov’s inequality, 4, 208 
Markov’s reverse inequality, 355 
Markov type, 32, 73 
maximum distortion, 333 
maximum entropy theorem, 238, 379, 
525 

maximum error probability, 214, 401, 
439 

mean ergodic theorem, 377 
memory less, 488 

message, 169, 436-439, 446, 450, 451, 
481 

message set, 440, 446, 447, 449 
mixed channel, 190, 194, 200, 416, 495, 
498 

mixed hypothesis testing, 273, 277 
mixed source, 20, 25, 27, 28, 33, 34, 36, 
40, 41, 44, 75, 76, 84, 95, 127, 131, 
134, 135, 274-277, 282, 296, 297, 299, 
317, 318, 380, 382, 383, 385, 389, 
393, 464, 465, 468, 469 
moment generating function, 80, 302, 
306 

multi-terminal information theory, 453 
multi-user information theory, 453 
multiple- access channel, 454, 481 
^-multiple access channel, 509 
mutual information, 6, 170, 341, 369 



mutual information density rate, 177, 
189, 224 

mutual information inequality, 183, 370 
mutual information rate, 204 
mutual information spectrum, 177 

noise power, 237, 525 
noise process, 181 

non increasing property of the spectral 
sup/inf-entropy rate, 113 
nonergodic, 14, 42, 177, 218 
nonstationary, 14, 42, 177, 218 
nonstationary and nonergodic 
information theory, 395 
nonstationary memory less channel, 187 
nonstationary memoryless multiple- 
access channel, 494 
nonstationary memoryless source, 19, 
274, 374 
norm, 426, 427 

normalized divergence, 121, 136 
normalized divergence distance, 124, 
136, 141, 153, 162, 167 
null hypothesis, 269 

one-point spectrum, 283 
optimistic coding, 263 
output alphabet, 169 
output distribution approximation, 405 
output distribution approximation 
problem, 418 

pessimistic coding, 264 
prefix code, 58, 164 

probability distribution approximation, 
106, 129 

probability distribution approximation 
problem, 154 
probability measure, 319 
probability transition density, 432 
projection, 70, 73, 74, 81, 93, 95, 292, 
294, 295, 297, 298, 304, 315, 317, 318 

quantization of probability distribution, 
426, 444 

random code, 171, 250, 350, 360 
random coding, 328, 340, 344, 351, 360, 
406, 457, 502 



Index 537 



random number generation, 103, 106, 
154 

random number generation problem, 
118, 158 

random number generator, 160, 164, 
446 

random number reproduction, 127 
rate-distortion function, 326, 357, 382 
/a-rate-distortion function, 334, 382, 
385 

/m-rate-distortion function, 334, 380, 
382 

ua-rate-distortion function, 335, 389, 
393 

um-rate-distortion function, 335, 385, 
389 

rate-distortion problem, 325 
rate-distortion theory, 325 
rate function, 79, 302, 306, 319 
reference word, 336, 337, 361, 387 
reliability function, 64, 86 
renaming, 97 
Renyi ^-entropy, 80 
reproduced information, 155 
reproduction alphabet, 326, 333, 337, 
340, 343, 357, 369, 371, 375, 377, 393 
reproduction process, 340 
resolvability, 118, 119, 136 
J-resolvability, 129, 132, 136, 405 
resolvability problem, 119, 124, 163, 
404 

^-resolvability problem, 141 
reversible process, 128 

Sanov’s theorem, 70, 81, 82, 93, 96, 
292, 294, 297, 298, 302, 304, 318, 319 
scaling factor, 102 
Schwarz inequality, 80 
secrecy of communication, 437 
secure communication, 450 
self-random number, 156, 158 
self-random number generation 
problem, 156 
separate coding, 455, 482 
separation principle, 248, 257, 261 
Shannon-Fano- Elias code, 11 
signal power, 237, 525 
simultaneous coding, 482 
simultaneous test, 397 



single-user information theory, 453 
size of uniform random number, 118 
Slepian-Wolf coding system, 454 
source alphabet, 1, 325, 333, 337, 369, 
375, 377, 393 

source coding, 1, 154, 248, 320 
source coding problem, 158 
spectral inf-divergence rate, 271 
spectral inf-entropy rate, 35, 61, 110, 
478 

spectral inf- mutual information rate, 
177, 225, 484 

spectral sup-entropy rate, 15, 61, 110, 
455 

spectral sup-information rate, 340 
spectral sup-mutual information rate, 
219, 406, 518 

spectral sup-divergence rate, 283 
stability, 425, 443 
standard space, 44 
state transition function, 73 
stationarity, 72 

stationary ergodic, 36, 49, 63, 377, 383, 
386, 393 

stationary ergodic source, 19, 27, 40, 
131, 134, 286, 376, 380, 382, 393, 
466, 481 

stationary Gaussian source, 379 
stationary irreducible Markov source, 
19, 28, 32, 71, 79, 93, 286, 293, 302, 
316 

stationary Markov source, 28 
stationary memoryless, 36, 488 
stationary memoryless channel, 169, 
182, 233, 412, 416, 425, 437, 450 
stationary memoryless Gaussian source, 
319 

stationary memoryless multiple-access 
channel, 521 

stationary memory less source, 1, 18, 29, 
44, 70, 93, 273, 275, 281, 292, 297, 
302, 315, 317, 319, 325, 369, 375, 
378, 380, 481 
stationary process, 376 
stationary source, 20, 389 
Stein’s lemma, 282 
stochastic, 357 



538 Index 



stochastic encoder, 252, 357, 361, 366, 
396, 443, 446-448 
stochastic process, 14 
strict domination, 257, 261 
strong converse property, 35, 37, 40, 48, 
63, 124, 126-128, 131, 134, 160, 218, 
220, 232, 233, 237, 241, 283, 285, 
286, 416, 419, 420, 424, 425, 431, 
435, 444, 445, 448, 449, 478, 513, 520 
strong converse theorem, 125, 126, 219, 
283 

subadditive, 367, 376-378, 383, 385, 
386, 389, 390, 393 

subadditive distortion measure, 376, 
377 

sup-entropy rate, 44, 55, 61 
supremum achievable error probability 
exponent, 271 

supremum achievable exponent, 278 
supremum achievable exponent in 
compound hypothesis testing, 278 
supremum e-achievable error probabil- 
ity exponent, 279 
supremum of r- achievable error 
exponents, 287 

supremum r- achievable fixed-length 
dual coding rate, 323 
synchronous vibration, 257 

target random number, 106, 118, 124, 
128, 131 

test codeword, 366 
time-sharing principle, 437, 456, 485 
transition probability, 169 
transmission code, 395-398, 401, 403, 
404, 418, 435, 437, 445, 447 
transmission rate, 170 
two-stage coding, 257 
two- stage encoder, 248 



two-state Markov, 150 
type, 30, 75 
type theory, 98 
typical sequence, 3, 171 

unifilar finite-state source, 73, 94, 294, 
316 

uniform integrability, 48, 49, 51, 52, 59, 
61, 63, 336-338, 343, 357, 367, 376, 
377, 382, 387, 389, 392 
uniform random number, 118-120, 128, 
141, 142, 160, 163, 404, 405 
uniformly bounded variance, 174 
upper cumulative probability, 255 

variable-length code, 8, 9, 350 
variable-length coding, 164, 333 
variable-length coding under the 
average distortion criterion, 335 
variable-length coding under the 
maximum distortion criterion, 334 
variable-length intrinsic randomness, 
143, 151 

variable-length intrinsic randomness 
problem, 142, 153, 164 
variable-length source coding, 153, 164 
variable-length uniform random 
number, 142, 143 

variable-length uniform random number 
generating rate, 167 
variational distance, 106, 114, 119-121, 
124, 129, 132, 141, 143, 152, 154, 
162, 167, 404, 407, 414, 418, 426 

weak prefix code, 51, 56 
weak sup-entropy rate, 55, 61 
weak variable-length code, 51 
weak variable-length coding, 168 
weak variable-length encoder, 168 



